

LABORATORY FOR COMPUTER SCIENCE



MASSACHUSETTS INSTITUTE OF TECHNOLOGY

LEVELY

MIT/LCS/TR-204

REAL-TIME CONTROL
STRUCTURES FOR BLOCK
DIAGRAM SCHEMATA

NOV 13 19TH

Thomas J. Teixeira

This research was supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Office of Naval Research under contract no. N00014-75-C-0661

This document has been approved for public release and salo; its distribution is unlimited.

545 TECHNOLOGY SQUARE, CAMBRIDGE, MASSACHUSETTS 02139

DOC FILE COPY

AD A 0 61

SECURITY CLASSIFICATION OF THIS PAGE (When Date Entered) **READ INSTRUCTIONS** REPORT DOCUMENTATION PAGE BEFORE COMPLETING FORM 2. GOVT ACCESSION NO. 3. RECIPIENT'S CATALOG NUMBER 1. REPORT NUMBER MIT/LCS/TR-204 TYPE OF REPORT PERIOD COVERED M.S.Thesis - January 1978 Real-Time Control Structures for Block Diagram 6. PERFORMING ORG. REPORT NUMBER Schemata. MIT/LCS/TR-204 8. CONTRACT OR GRANT NUMBER(\*) AUTHOR(+) Thomas Z Teixeira NO 0014-75-C-0661 PERFORMING ORGANIZATION NAME AND ADDRESS PROGRAM ELEMENT, PROJECT, TASK AREA & WORK UNIT NUMBERS MIT/Laboratory for Computer Science 545 Technology Square Cambridge, MA 02139 11. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DAT Advanced Research Projects Agency August 178 Department of Defense 1400 Wilson Bouleyard Arlington, VA 22209 rlington, . SECURITY CLASS. (of this report) 14. MONITORING AGENCY NAME & ADDRESS(If different from Controlling Office Office of Naval Research Unclassified Department of the Navy 15a. DECLASSIFICATION/DOWNGRADING Information Systems Program Arlington, VA 22217 16. DISTRIBUTION STATEMENT (of this Report) Approved for public release; distribution unlimited 17. DISTRIBUTION STATEMENT (of the abstract entered in Block 20, if different from Report) 18. SUPPLEMENTARY NOTES 19. KEY WORDS (Continue on reverse side if necessary and identify by block number) real-time scheduling priority scheduling deadline-driven scheduling multiple processor real-time systems STRACT (Continue on reverse side if necessary and identify by block number) Block diagram schemata model computation systems in the context of an external environment. The environment imposes various constraints on the real-time performance of any implementation of a block diagram schemata. The model is used to provide precise definitions of real-time performance. The portion of the implementation that affects the real-time performance is called the control This research investigates several strategies for synthesizing control structures to satisfy the external real-time specifications. The simplest strategy DD , FORM 1473 EDITION OF 1 NOV 65 IS OBSOLETE SECURITY CLASSIFICATION OF THIS PAGE (When Date Entered) 409 648

20. is to execute all the blocks in the diagram in some fixed order. Control structures of this type have been somewhat ignored for time critical applications. The synthesis problem is shown to be solvable in the sense that acyclic control structures do not need to be considered. A branch-and-bound synthesis algorithm is presented which requires exponential time in the worst case. Although no efficient synthesis algorithm was found, the conjecture that the problem is NP-complete is not proved.

The other strategy for implementing control structures makes use of the fact that in some applications the input values change at discrete times. Under this assumption, block diagram schemata are similar to traditional models of real-time computations. An efficient algorithm for assigning fixed priorities to independent tasks is presented that guarantees the real-time specifications will be met. This algorithm relaxes previous restrictions of the deadline for a task being coincident with its next request.

Finally, some of the issues involved with multiple processor control structures are discussed, although no specific algorithms are investigated.



## MIT/LCS/TR-204

# REAL-TIME CONTROL STRUCTURES FOR BLOCK DIAGRAM SCHEMATA

by

Thomas Joseph Teixeira

August 1978

Massachusetts Institute of Technology

Laboratory for Computer Science

Cambridge

Massachusetts 02139

#### REAL-TIME CONTROL STRUCTURES FOR BLOCK DIAGRAM SCHEMATA

by

#### Thomas Joseph Teixelra

This report is a minor revision of a thesis submitted to the Department of Electrical Engineering and Computer Science on January 30, 1978 in partial fulfillment of the requirements for the Degree of Master of Science.

#### **ABSTRACT**

Block diagram schemata model computation systems in the context of an external environment. The environment imposes various constraints on the real-time performance of any implementation of a block diagram schema. The model is used to provide precise definitions of real-time performance. The portion of the implementation that affects the real-time performance is called the control structure.

This research investigates several strategies for synthesizing control structures to satisfy the external real-time specifications. The simplest strategy is to execute all the blocks in the diagram in some fixed order. Control structures of this type have been somewhat ignored for time critical applications. The synthesis problem is shown to be solvable in the sense that acyclic control structures do not need to be considered. A branch-and-bound synthesis algorithm is presented which requires exponential time in the worst case. Although no efficient synthesis algorithm was found, the conjecture that the problem is NP-complete is not proved.

The other strategy for implementing control structures makes use of the fact that in some applications the input values change at discrete times. Under this assumption, block diagram schemata are similar to traditional models of real-time computations. An efficient algorithm for assigning fixed priorities to independent tasks is presented that guarantees the real-time specifications will be met. This algorithm relaxes previous restrictions of the deadline for a task being coincident with its next request.

Finally, some of the issues involved with multiple processor control structures are discussed, although no specific algorithms are investigated.

Key Words and Phrases: real-time scheduling, priority scheduling, deadline-driven scheduling, control structures

## **Acknowledgements**

Steve Ward has been indispensable as an advisor in transforming my rather hazy ideas about real-time programming into a workable research topic. His enthusiasm kept this research progressing at many points when I was stuck or otherwise sidetracked.

Thanks are due to John Pershing, Al Mok and Jay Wahld for their work in providing a test bed for some of the ideas expressed in this thesis. Al Mok has also been especially helpful with his knowledge about scheduling theory and algorithms.

The entire Domain Specific Systems Research group has at some time contributed to the computer facilities that made the actual production of this document as well as the research possible, especially John Pershing and Terry Hayes.

Finally, I wish to thank Gillian Telxelra for her emotional and moral support throughout the course of this research.

This research was supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Office of Naval Research under Contract No. NO0014-75-C-0661.

# **Table of Contents**

| 1: Introduction                                                  | 6.  |
|------------------------------------------------------------------|-----|
| 1.1: Previous Work                                               | 6.  |
| 1.2: Statement of the Problem                                    | 10. |
| 1.3: Thesis Overview                                             | 11. |
| 2: Block Diagram Schemata                                        | 13, |
| 2.1: Real-Time Performance and Specifications                    | 15. |
| 2.2: Functionality of Blocks                                     | 18. |
| 2.3: Example                                                     | 21. |
| 3: Static Control Structures                                     | 25, |
| 3.1: Existence of Cyclic Control Structures                      | 27. |
| 3.2: Generating Real-Time Control Structures                     | 33. |
| 3.3: A Branch-and-Bound Method for Generating Control Structures | 38. |
| 3.3.1: Determining the Relative Frequency of Constraint Paths    | 38. |
| 3.3.2: Strategies for Combining Solutions                        | 40. |
| 3.3.3: Performance of the Algorithm                              | 45. |
| 3.3.4: Speeding up the Algorithm                                 | 48. |
| 3.3.5: Practical Experience                                      | 50. |
| 3.4: Heuristics for Generating Control Structures                | 52. |
| 4: Static Priority Interrupt Control Structures                  | 54. |
| 4.1: Dynamic Centrol Structures                                  | 54. |
| 4.2: Model for Static Interrupt Control Structures               | 55. |
| 4.3: Assigning Priorities to Independent Tasks                   | 60. |
| 4.4: More Complex Models                                         | 62. |
| 4.4.1: Scheduling Overhead                                       | 62. |
| 4.4.2: Non-preemptive Control Structures                         | 63. |
| 4.4.3: Non-Distinct Priorities                                   | 63. |
| 4.5: Applications to the Control Structure Problem               | 64. |
| 4.5.1: Chains of Independent Tasks                               | 64. |
| 4.5.2: More Complex Task Relations                               | 65. |
| 4.5.3: Combining Static and Dynamic Control Structures           | 67. |
| 5: Multiple Processor Control Structures                         | 69. |
| 5.1: Assigning Control Structures to Multiple Processors         | 72. |
| 5.2: Dynamic Assignment of Processors                            | 74. |
| 6: Summary and Conclusions                                       | 75. |
| References                                                       | 78. |

# List of Figures

| 2-1: A block diagram schema requiring a multi-processor control structure | 17. |
|---------------------------------------------------------------------------|-----|
| 2-2: A Block Diagram Containing a Cycle                                   | 20. |
| 2-3: Typical block diagram schema                                         | 21. |
| 2-4: Latencies for static control structures                              | 22. |
| 2-5: Latencies for dynamic control structures with static schedulers      | 24. |
| 3-1: Typical Laxity Table                                                 | 31. |
| 3-2: Counter-Example to Least Laxity Scheduling                           | 37. |
| 3-3: Block Diagram Where All Constraints Appear More Than Once            | 41. |
| 3-4: Regions of a Critical Window                                         | 45. |
| 3-5: Counter-Example to Slack as a Dominance Relation                     | 46. |
| 4-1: Counter-example to priority = 1 / latency                            | 61. |
| 5-1: A simple multi-processor control structure                           | 70. |

### Real-Time Control Structures for Block Diagram Schemata

### 1: Introduction

There are many applications for computers where the real-time performance of the program is critical. These applications all involve asynchronous interaction with the external environment and it is this environment that imposes the real-time restrictions. For example, device drivers in operating systems must respond to interrupts before the information is lost. Another application is in direct digital control and process monitoring.

However, most high-level languages are not designed for producing time critical programs. The languages allow the user to define appropriate functional and data abstractions for his problem, but have no notion of real-time or asynchronous interaction with the real world. Instead, the user must design a control structure for his problem suitable for a single sequential process that will satisfy all the real-time constraints.

#### 1.1: Previous Work

Many operating systems do have notions of real-time and external input and output, but they are supported at a fairly low level [19, 20]. The application program typically has to deal with priorities, setting real-time alarms, and responding to interrupts. These actions may be necessary to satisfy the constraints, but they to not bear a close relationship to the constraints. For example, it is seldom abvious what priority must be assigned to a task that must complete in ten milliseconds and uses one millisecond of CPU time.

Early work on applications oriented real-time operating systems was done by

Previous Work Section 1.1

Fiala [5]. Fiala proposed a model of real-time processes characterized by three parameters per process.

- (1)  $P_i$  the maximum CPU time used by process i.
- (2)  $D_i$  the maximum delay allowed from the time process i requests service to the completion of servicing that request.
- (3) I, the minimum period between requests for process I.

Fiala proposes three scheduling algorithms for this model. The first (and simplest) executes the process that must complete the soonest. i.e. the process with the earliest deadline. This algorithm is optimal in the sense that if any schedule satisfies the deadline requirements for all the processes, so does the earliest deadline schedule. However, this result is proved in the context of process switching requiring negligible overhead.

Fiala's second algorithm is a modification of the earliest deadline scheduler that minimizes the number of process switches while retaining the optimality condition of the earliest deadline algorithm. This is accomplished by having the scheduler check to see if the current process must be preempted when a process with an earlier deadline requests service. This is done by simulating the action of the scheduler on the current requests. Unfortunately, this algorithm would require extensive computation whenever a process requests service. Accordingly, Fiala's third algorithm pre-computes a lower bound on the expression required by the minimum switching algorithm. With the lower bound, the extra computation required by the third algorithm requires an extra comparison at process request time. The algorithm is also optimal in the same sense and requires less overhead than the simpler

Previous Work Section 1.1

earliest deadline elgorithm.

However, Fiala makes no attempt to integrate his model and scheduler into a real-time language system. One such approach is control robotics developed by Dertouzos [3] and Geiger [6]. A control robotics program is organized as a set of daemons which continuously monitor some condition and execute the body (a corrective procedure) when the condition is true. The real-time specifications for a daemon are the delay from when a condition becomes true to when the program detects the condition (the recognition time) and the delay from detecting a condition and executing the body (the response time). Geiger's implementation of control robotics periodically samples the condition with a period slightly less than the recognition time (the slightly higher rate will allow for preemption by other daemon conditions). The daemon bodies are scheduled using an earliest deadline scheduler.

One weakness of control robotics is that no guarantee of satisfying the real-time constraints is made at compile time. This could be done if the user declared a minimum period between executions of a daemon body and the compiler determined the computation time of the daemon bodies. Since it is impossible to determine the computation time for an arbitrary procedure, the compiler may require declarations to determine the computation time.

A more substantial problem of Geiger's implementation is the assumption that the conditions for daemons are independent of the execution of other daemon bodies. Therefore, complex structures of daemons whose conditions depend on variables changed by other daemons could result in much unnecessary computation. All in all, control robotics does not provide any more of a model for real-time programming

Previous Work Section 1.1

than Flala's work beyond suggesting some syntax for identifying tasks and specifying their deadlines.

Another system that deals with real-time specifications at the user level is TOMAL (Task Oriented Microprocessor Language) [12]. On the surface TOMAL is a combination of a modern block structured programming language and a typical minicomputer 'real-time' operating system. However, in addition to assigning static priorities to tasks, a response time may be specified for a task. This response time is similar to the recognition time for control robotics and specifies the maximum delay between a request for a task activation and the initiation of that task. Another feature of TOMAL is that interrupt routines only request task activation and do not respond to the interrupt in any substantive way. This reduces the amount of object code that does not run under the task scheduler and allows the TOMAL system to check the consistency of the real-time constraints for the entire system. However, TOMAL makes no attempt to verify real-time specifications on service times for tasks.

Data flow schemata deserve mention as a real-time system since one proposed applications is digital signal processing [2, 22]. It is designed to facilitate highly parallel computation and statements may be executed as soon as all their input variables have been computed. If several statements are executable an arbitrary statement is chosen. However, with the addition of real-time constraints to mediate this decision, data flow would be a powerful real-time system. The other major drawback of data-flow is that is not suited for implementation on conventional computer architectures.

#### Statement of the Problem

1.2:

The goal of this research is to develop theory that is applicable to the implementation of a programming system designed to the restricted domain of time-critical applications. The main criterion of the suitability of the language to this domain should be that small changes in the real-time specifications should result in small, obvious changes in the source program. It is conceivable, and indeed desirable, that these changes could have a dramatic effect on the object program produced. This reorganization of the object program is precisely the process that should be automated.

Conventional languages already provide facilities for functional and data abstraction, and numerous researchers are already working in this area. Therefore, this research will focus on the global control structure for programs. This includes issues such as the number of processors to use in an implementation, deciding what interrupt structure (if any) is necessary, decomposing the program into tasks, and assigning parameters required by the appropriate task scheduler.

Since normal language semantic issues are being avoided, the description of a program can be made extremely simple. The intuitive model for a real-time program is that of continuous time analog block diagrams. The graph defines a precedence relation among operators identical to the data flow in the diagram. The program will be specified as a directed graph of actions to be performed and their functional dependence, with arcs of the graph representing data paths. The graph must be acyclic since cycles in a block diagram represent feedback systems. Automatically producing an object program that solves the feedback equation would require more detailed semantics for the programs as well as other disciplines outside the scope

of this research. However, in some special cases, cycles can be handles by rearranging the block diagram. A strict upper-bound must be placed on the computation time required for each action. The real-time constraints specify upper bounds of the propagation delays through the block diagram and of the bandwidths of the input and output signals.

### 1.3: Thesis Overview

Chapter 2 develops the block diagram model of computation. The block diagram model is a program schematic model similar to data flow. However, real-time and an external environment are explicit in the model. In addition, the block diagram model separates the data-flow of the schema from the control flow, which is embodied in the control structure. The control structure specifies the execution order of the blocks at object time. The research problem may be formalized as finding control structures for block diagram schemas which satisfy the given real-time specifications. The major use of the model is to define the semantics of the real-time specifications.

Chapter 3 investigates various static control structures (control structures that are independent of the data values at object time). Although static control structures may be used widely in specific applications (particularly in small, dedicated systems such as those implemented on microcomputers), they have been ignored by designers of real-time programming systems, mainly because their real-time performance in the general case has not been studied.

Chapter 4 investigates extended semantics where the external inputs do not change continuously. In this situation, a dynamic control structure may be used. A

Thesis Overview Section 1.3

dynamic control structure is a control structure that does depend on the data values at object time. The chapter investigates a subclass of dynamic control structures, namely static priority interrupt control structures. The prototypical example is an interrupt system where the system does nothing until an input changes, although it includes systems without physical interrupts where the inputs are sampled. The priorities are static as opposed to the earliest deadline scheduler where the priority of a task is a function of time.

Chapter 5 discusses some of the Issues that arise when more than one processor is available for the implementation. The real-time performance of multiprocessor systems are analyzed and the real-time performance of a block diagram schema is bounded. Some techniques for distributing the processing among several processors are suggested, although specific algorithms are not studied.

Most models of computation do not capture the notion of a "real-time" system which monitors continuously changing inputs from some external environment. Block diagram schemata model the external environment explicitly and recognize the existence of real-time specifications placed by the environment on the computing mechanism. They are based on the intuitive model of the conventional analog block diagram whose inputs and outputs are changing continuously. An (m,n) block diagram schema consists of an (m,n) block diagram module, a control structure, a configuration and an environment which manipulates the configuration asynchronously with the control structure. Within the model, it is assumed that values change continuously. Obviously, the computations cannot be performed continuously on a digital computer. The real-time specifications determine how often the control structure must compute new values, as well as how fast it must compute them.

An (m,n) block diagram module is a directed graph whose nodes are either blocks or links. The terms predecessor and successor will be used with the conventional definitions. Data is stored in the links while the blocks perform the actual computation. Accordingly, only one arc may point to each link. The graph must be proper in the sense that arcs may not point from links to links or from blocks to blocks. Upper-case letters will be used to denote blocks and lower case letters to denote links. The predecessor of a link is called the specifier of that link and the successors of a link are called the watchers of the link. The predecessors and successors of a block are called the inputs and outputs of the block respectively.

An (m,n) module has m links with no input arcs (input links) and n links with no output arcs (output links). The input links receive their values from an external,

continuous time function called the *input signal*. The values at the output links define an external, continuous time function called the *output signal*.

The model assumes the existence of a global clock which defines the passage of real time. Hewitt argues against the use of global clocks since they cannot be implemented in distributed systems [9]. While Hewitt's objections against global clocks are valid, assigning times within Hewitt's framework of local orderings would be more complicated. This complexity is unnecessary since the events being timed are always ordered by one of Hewitt's local orderings.

A configuration is an assignment of tokens to the links of a schema. The token contains a value and a set of labels of the form (link, birth). These labels indicate when the token arrived at the input link link. Each link always contains some token, since signals are always defined in a continuous time block diagram.

The computation of a block diagram schema is described by a series of anapshots. A snapshot consists of a block diagram module and an associated configuration. The initial snapshot assigns null values to all tokens except for tokens on the input links of the schema which are assigned the current value of the input signal. The label set of all links is initialized to {(/ink, 0)}. The computation proceeds from one snapshot to the next through the firing of blocks. The control structure is the strategy for choosing which block to fire next. The fired block accesses the tokens on its input links, and replaces the tokens on its output links. The label set for the output token becomes the union of the old label set of the token and the label sets that were assigned to the tokens on all the input links of the block. The time in the label (/,t) for the link / at each input arc of the fired block is replaced by the label (/,t) where time is the current

contents of the global clock. This action occurs after any tokens have been replaced on the output links, but the time for the new label sets is immediately after the input tokens were accessed. In addition, if / is an input link, its value is set to the current value of the input signal. The block need not replace any output tokens. This differs from data flow since tokens are not removed from the input links after a block is fired. The data flow restriction is not appropriate since the value of a token is defined at all times.

The amount of computation time used by block A is denoted  $t_A$ . If the control structure fires block A on some processor at time t, that processor will complete and replace the output tokens on that block by the time  $t+t_A$ . The computation times used will be upper bounds either computed by whatever language processor is used to create the primitive blocks or declared by the user.

## 2.1: Real-Time Performance and Specifications

A block diagram schema is an approximation to a continuous time block diagram. There are many factors affecting the quality of the approximation. However, the factors influenced by the control structure are how long the schema takes to compute the values of output tokens from the input tokens, and how often it performs these computations. The real-time specifications will place bounds on these quantities. A control structure that satisfies all the real-time specifications is called a feasible control structure.

The age of a token with respect to a link l at time t is defined as  $t-t_0$  if  $(l, t_0)$  is in the label set of the token, and undefined otherwise. The latency between

links a and b is denoted  $\mathbf{I}_{a,b}$  and is the upper bound of the age at any time of tokens at b with respect to link a. The user can specify an upper bound on the latency between two links. The first link will be an input link of the schema and the second link will be an output link.

Latency specifications can also be expressed in terms of continuous-time functions:

$$\hat{b}(t) = F(a(t-\Delta(t)), \cdots), \ \Delta(t) \leq I_{a,b}$$
 (2-1)

Here  $\hat{b}(t)$  is the function whose value is the value of the token at link b at time t; a(t) is the function whose value is the signal at link a at time t;  $\Delta(t)$  corresponds to the age of the tokens at link b. Notice that  $\Delta(t)$  is generally <u>not</u> constant, but is bounded. The user knows how close  $\hat{b}(t)$  must be to  $b(t) = F(a(t), \cdots)$ . Using information about the magnitude of F and a and their derivatives, the user can use equation (2-1) to calculate the latency specifications necessary to achieve the desired accuracy of  $\hat{b}(t)$ .

The other measure of real-time performance is how often new values are computed. The bandwidth from link a to link b (notation  $B_{a,b}$ ) is the maximum rate at which the control structure must compute new values at b from values at a. The bandwidth specification is not easily expressible in terms of continuous-time functions. It may be thought of as a requirement on how often the value of  $\hat{b}(t)$  must change.

The bandwidth specification may seem superfluous since the latency specifications also implies how often the value of  $\hat{b}(t)$  changes. However, it is possible for a multiple processor control structure to exhibit bandwidth performance

that exceeds the rate implied by the latency specification. An example is shown in figure 2-1.



t<sub>A</sub> = 10msec

t<sub>B</sub> = 10msec

B<sub>a,c</sub> = 75/sec

la,c = 40msec

A block diagram schema requiring a multi-processor control structure Figure 2-1

In this example, both A and B require ten milliseconds of computation time. A single processor control structure that executes  $ABABAB \cdots$  can guarantee a latency from a to c of forty milliseconds and a bandwidth from a to c of fifty per second. However, if processor one executes  $AAA \cdots$  and processor two executes  $BBB \cdots$ , then the latency from a to b is still only forty milliseconds but the bandwidth increases to one hundred.

While the block diagram model is useful for defining performance for real-time programs, it does not yield many insights into the problem of synthesizing a feasible control structure. The graph itself resembles a partial order on a set of tasks, but the semantics of block diagram schemata are not as restrictive as this partial order. In most schematic models, a task must not be executed until all its predecessors have been executed since (presumably) it would not have data available at all its inputs. The block diagram model has no such restriction and as

a result is able to execute some parts of the schema more often than other parts.

On the other hand, there are certain execution orders that can be ruled out since they are obviously irrefficient. For example, once a block has been fired, it need not be fired again until one of its predecessors has been fired again since all its inputs will be unchanged. Therefore, it outputs will not change. Similarly, if no successor of a block A is fired between firings of A, the previous execution of A was unnecessary since no block looked at the previous values of the tokens on the output links of A.

If these restrictions are combined, each firing of a block must be surrounded (in time) by at least one predecessor and at least one successor. Equivalently, the allowable execution sequences may be found by shuffling all the paths from an input link to an output link. These paths will be referred to as constraint paths or just constraints.

# 2.2: Functionality of Blocks

The semantics of block diagram schemata make some useful block functions awkward to implement. For example, a block that performs differentiation is essential for applications in real-time process monitoring and control. In classical direct digital control, the system is discretized by sampling at some specific period. Differentiators are replaced by unit delays and the feedback gains are adjusted appropriately. This is possible only because the inputs are sampled at a known frequency.

In block diagram schemata there is no guarantee of periodic execution. The bandwidth specifications set a lower bound on how often a block must be

executed, and a different lower bound may be implied by the latency specifications. They do not place any upper bound on how often the block is executed. Therefore, it is impossible to tell a priori when and how often a block will be executed. This would seem to rule out any blocks that would require state variables, but this is not true. A white noise generator could be implemented using a pseudo-random number generator. This would use a state variable, but it would not run into any problems by not knowing how often it is executed. But most other functions that need to produce or transform a time dependent sequence of values will be impossible to implement.

The only general solution to the problem is to have a real-time clock as part of the system. Then a differentiation block could remember both its previous input and the time it was last executed and compute the obvious first order approximation. The major difficulty is that the real-time clock would have to provide much finer resolution than the 60 cycle clocks found in typical computer systems.

The user should be able to define his own time dependent functions since any selection of primitive blocks will probably turn out to be too limited for some application. Therefore, it becomes necessary to provide some primitive blocks which would probably lead to nonsensical programs if used carelessly. In particular, if the user had a unit delay block and access to the real-time clock he could define arbitrary approximations to differentiators, although undisciplined used of the unit delay block would result in useless programs.

Implementing integration would still be a problem since the block diagram for a first order integrator would contain a cycle (see figure 2-2). The problem with cycles is that it is unclear whether the cycle represents use of a state variable,

as in data flow, or implied solution of simultaneous equations, as in continuous time block diagrams. In the case of integrators it is clear that the cycle represents use of a state variable, since the cycle contains a unit delay block. In this case, the cycle can be broken at the input to the delay block. The delay block is treated as a watcher of link e, even though it gets its input from link f. This transformation alters the order in which the blocks are executed by changing the constraint paths. Unit delays were handled by a similar transformation in BLODI [11], a system for simulating discrete time block diagrams, and would be handled in the same way by a programmer [21].



A Block Diagram Containing a Cycle Figure 2-2

2.3: Example

The interaction between the real-time specifications and the control structure can be illustrated by a series of examples. In these examples the block diagram module is left unchanged while the latency and bandwidth specifications are varied. These variations will necessitate changes in the control structure used to implement the block diagram schema. The block diagram module itself is shown in figure 2-3.



Typical block diagram schema Figure 2-3

The simplest control structures to consider are cycles that repeatedly execute the blocks in some fixed order. There 3! (= 6) ways of executing four blocks once per cycle (ignoring starting transients). For a small example like this it is feasible to enumerate all such cycles and test them to see if they satisfy the latency

constraints <sup>1</sup>. All these control structures are independent of when new tokens actually arrive. The worse-case assumption is that a new token arrives immediately after the previous token is marked old. This assumption is used in calculating worst-case latencies, which are shown in figure 2-4. Notice that although ABCD is better than ACBD and ADBC is better than ADCB, there is no best control structure. In fact, we can choose latency specifications such that only one of the control structures will work. The first six control structures in figure 2-4 sample the inputs once per cycle, i.e. once every 30 time units. However, if any of the bandwidths  $\mathbf{B}_{\mathbf{a},\mathbf{c}}$ ,  $\mathbf{B}_{\mathbf{a},\mathbf{f}}$  or  $\mathbf{B}_{d,\mathbf{f}}$  is greater than 1/30 then some other control structure must be used.

| Control<br>Structure | la,c | I <sub>a,f</sub> | l <sub>d,f</sub> |  |
|----------------------|------|------------------|------------------|--|
| ABCD                 | 45   | 60               | 45               |  |
| ACBD                 | 55   | 60               | 50               |  |
| ACDB                 | 60   | 55               | 45               |  |
| ADCB                 | 60   | 45               | 60               |  |
| ADBC                 | 50   | 45               | 55               |  |
| ABDC                 | 45   | 50               | 60               |  |
| ABDCD                | 50   | 55               | 50               |  |
| ADBCD                | 55   | 50               | 50               |  |
| ABCABD               | 40   | 65               | 75               |  |
| ACDBCD               | 75   | 70               | 40               |  |
| ADBADC               | 65   | 40               | 70               |  |

Latencies for static control structures Figure 2-4

A slightly more complicated class of control structures is cycles where some blocks may be executed more than once. For example, the control structure

<sup>1.</sup> However, such an algorithm is not practical since the computation time taken by such an algorithm would grow exponentially with the number of blocks.

ABCABD has worst-case latencies as shown in figure 2-4. This control structure will satisfy its bandwidth constraints if  $B_{a,C}$  is less than one every twenty time units and  $B_{a,f}$  and  $B_{d,f}$  are less than one every forty-five time units.

The next class of control structures to consider are dynamic control structures with static priority scheduling. These control structures make use of the current environment to determine which blocks to fire next. The dynamic control structures assume that the values of tokens at input links do not change continuously. When the value of a token at an input link changes, a request is made for a set of tasks. The request is serviced by firing a fixed sequence of blocks as specified by the task. Since the processor is generally busy when a request occurs, the requests are remembered until the processor is idle, when one of the requested tasks is selected to be executed. Each task is assigned an integer priority. The task with the highest priority is serviced next. The scheduler is static since the priority for a task is always the same relative to other tasks. The earliest deadline scheduler is an example of a dynamic priority scheduler, since the priority of a task depends on its current deadline. If the task being serviced can be temporarily suspended, the control structure is preemptive.

A dynamic control structure need not be interrupt driven. For example, the control structure could sample the inputs between executing blocks. However, preemptive control structures cannot be implemented without interruputs.

In the example of figure 2-3, there are many ways to construct tasks to be requested by changing inputs. One such task system is to fire ABD (or ADB) when the value at a changes, and CD when the value at d changes. The worst case occurs when the values at a and d change simultaneously. The latencies for this

case are shown in figure 2-5. These latencies can be sustained only if the bandwidths at a and d are both less than once every 35 time units (otherwise the control structure would fall behind). In a sustained worst case, new tokens arrive once every 35 time units. A trace of block firings would seem to indicate that the static control structure ABDCD is being executed, which has latencies 15 to 20 units larger than those for the dynamic control structure. However, in the dynamic case it is known exactly when the input signal change. In particular, the processor will be idle if more than 35 time units elapse between a change in input signals, so the processor will be able to respond to a change immediately. In a static control structure, the change would not be responded to until the control structure gets around to it.

| Tas        | k String |      |      |                  |  |
|------------|----------|------|------|------------------|--|
| Priority 2 |          | la,c | la,f | l <sub>d,f</sub> |  |
| ABD        | CD       | 30   | 35   | 15               |  |
| ADB        | CD       | 35   | 30   | 15               |  |
| CD         | ABD      | 15   | 20   | 35               |  |
| CD         | ADB      | 20   | 15   | 35               |  |

Latencies for dynamic control structures with static schedulers Figure 2-5

The main function of the control structure in a schema is to specify when to fire each block. If the control structure is independent of the configuration (i.e. unaffected by changes made by the environment) it is a *static* control structure. An example of a static control structure is a loop which fires all of the blocks in the schema cyclically. Control structures which make use of configuration (e.g. via interrupts) are called *dynamic* control structures.

The latency specification from a to b will be satisfied only if all the blocks along all paths from a to b are fired at least once during each time interval of duration  $I_{a,b}$  time units. Otherwise there would be time intervals longer than  $I_{a,b}$  when the riabel at b will not change and therefore the age with respect to a of the token at b will be greater than  $I_{a,b}$ . Similarly, the bandwidth specification from a to b will be satisfied if and only if the interval between firing the blocks along the constraint paths is less than  $1 / B_{a,b}$ .

For single processor control structures it is possible to construct a trace of the blocks that are fired by the control structure. The trace is a string over an alphabet  $\Sigma$  whose elements correspond to the blocks of the schema. Each element A of  $\Sigma$  is assigned a weight (notation |A|) equal to  $t_A$ . The weight of a string is defined to be the sum of the weight of its elements. A string  $S_1$  contains  $S_2$  if all the elements of  $S_2$  appear in  $S_1$  in the order they appear in  $S_2$ . For example, the string ABCDE contains the string BD, even though BD is not a substring of ABCDE. Regular expressions will be used to denote sets of strings. In particular, if S is a string,  $S^*$  denotes the set of strings  $S_1$ ,  $S_2$ ,  $S_3$ ,  $S_4$  as well as the empty string.

It is necessary to model intervals in continuous time of arbitrary origin and duration, since the latency specifications require <u>all</u> intervals of specific duration to

contain the corresponding constraint path. Therefore the weight of the initial and final elements of a string may be counted at less than their nominal weights. For example, if  $|a_1 \cdots a_k| = w$  (weighting  $a_1$  and  $a_k$  at  $|a_1|$  and  $|a_k|$ ), then  $[a_1 \cdots a_k]$  is a string of weight less than w since both  $a_1$  and  $a_k$  are weighted at less than  $|a_1|$  and  $|a_k|$ . However, if the initial or final elements do not have full weights, the may not be included as part of any contained string. Weighting these elements at less than their full values corresponds to shrinking an interval of size w in continuous time: if the interval starts after  $a_1$  starts executing, then the interval does not contain  $a_1$  reading its inputs. A string will be preceded by a '[' or followed by a ']' if the first or last element in the string is weighted at less than its nominal value.

A single processor static control structure is completely specified by its trace, which is determined at compile time (hence the name stat/c control structure). The real-time specifications on the control structure can be rephrased as constraints on its trace. In particular, the latency specification from a to b is satisfied if and only if all the constraint paths from a to b are contained in every substring in the trace of weight  $I_{a,b}$ . The bandwidth specification is satisfied if and only if the weight of all substrings between occurrences of the constraint paths are less than  $1 / B_{a,b}$ .

At this point it is possible to deal exclusively with the trace of the control structure and the constraint paths. Constraint path i will be denoted  $C_i$  with latency specification  $I_i$  and bandwidth specification  $B_i$ . If  $C_i$  is a path from a to b,  $I_i = I_{a,b}$  and  $B_i = B_{a,b}$ . It will also be necessary to deal with the tails of the

constraint paths. If  $C_i = c_{i,1}c_{i,2}\cdots c_{i,n}$ , where  $c_{i,j}\in\Sigma$  then the jth tail of  $C_i$  is  $C_{i,j}=c_{i,j}c_{i,j+1}\cdots c_{i,n}$ .

Since the control structure must satisfy the real-time specifications for all time, the trace corresponding to the control structure will be a infinitely long string. Since the control structure can be implemented only if the trace can be generated using a finite program, it would be very awkward if the only feasible control structures were acyclic. Fortunately, it can be proved that if any feasible control structure exists, then there exists a feasible control structure that fires the blocks in some cyclic order.

### 3.1: Existence of Cyclic Control Structures

The theorem proved in this section can be stated as: Suppose there exists a string  $\omega = a_1 a_2 a_3 \cdots \in \Sigma^n$  such that  $\omega$  satisfies the real-time constraints. Then there also exists a finite string  $\beta$  such that the string  $\beta^n$  also satisfies the real-time specifications.

This theorem will be proved using several lemmas.

<u>Definition</u>: A critical window of a control structure  $\omega$  for the constraint  $C_i$  is a substring  $\psi_i = a_k \cdots a_m$  of  $\omega$  that contains two occurrences of  $C_i$ , but  $[\psi_i]$  contains no occurrences of  $C_i$ .

The most critical window for C<sub>j</sub> is the critical window with the greatest weight.

Lamma 3-1: The string  $\omega$  satisfies the latency specifications for C, if and only if  $|\psi_j| \le 1$ , for the most critical window  $\psi_j$  in  $\omega$ .

Proof:

only if: Assume  $\omega$  satisfies the real-time specifications. Then any substring of  $\omega$  of weight  $I_i$  contains  $C_i$ . In particular, the substring  $[a_k \cdots a_m a_{m+1}]$  of weight  $I_i$  must contain  $C_i$ . Since  $[\psi_i]$  does not contain  $C_i$ , the substring  $[\psi_i]$  of weight  $I_i$ — $\epsilon$ , where  $\epsilon$  is arbitrarily small contains one occurrence of  $C_i$ . Therefore,  $|\psi_i| \le I_i + \epsilon$ ,  $\epsilon > 0$ .

if: Assume the most critical window  $\psi_i$  has weight greater than  $I_i$ . Let  $\gamma$  be any substring of  $[\psi_{i,1}]$  where  $|\gamma| = I_i$ .  $\gamma$  exists since:

Since  $\psi_i$  is a critical window, then  $[\psi_i]$  contains no occurrences of  $C_i$ . But  $\gamma$  is a substring of  $[\psi_i]$  and also does not contain  $C_i$ . Hence,  $\gamma$  is a substring of  $\omega$  of weight  $I_i$  that does not contain the constraint path. Therefore,  $\omega$  does not satisfy the latency specifications.

<u>Corollary:</u> Since  $\psi_i$  contains two occurrences of  $C_i$ , the period between successive occurrences of  $C_i$  must be less than  $I_i - |C_i|$ .

This lemma shows there is a time limit between the starts of successive occurrences of C<sub>i</sub>. The bandwidth specifications directly limit this interval. Therefore, it will be assumed that the latency specifications are more severe than the bandwidth specifications. If not, the latency specifications can be adjusted so that:

$$|I_j \leq \frac{1}{B_j} + |C_j|$$

The time remaining until the start of the next appearance of a constraint path is called the *laxity* of that constraint. Given a control structure, we can construct a table of laxities for each position in the corresponding string which the property that the table entries are non-negative if and only if we satisfies the latency specifications. The only difficulty is in accurately determining the start of an occurrence of a constraint string. This will be handled by keeping laxities for the -28-

tails of the constraint strings. The true laxity for a string will be reflected in the laxities of its tails if the start of the constraint path is falsely identified.

An element of the table d[i,j,k] is the laxity for the path  $C_{i,j}$  just before  $a_k$  is fired. The table should be thought of as rectangular with columns labeled by elements of  $\omega$ . The entries in the first column are:

$$d[I,J,0] = I_{J} - |C_{J,J}|$$
 (3-1)

since the constraint path  $C_i$  must occur by  $I_i - |C_{i,j}|$ . The remaining columns can be filled in by simple recursion rules.

If the next element in  $\omega$  is not the same as the first element in a constraint path, the laxity for that path decreases by the weight of that element:

$$a_k \neq c_{i,j} \Rightarrow d[i,j,k+1] = d[i,j,k] - |a_k|$$
 (3-2)

There are two possibilities if the next element in the solution is the same as the first element in a constraint path. If this is the start of an occurrence of a constraint path, the laxity for the tail of that path should be no more than the current laxity for the constraint path. It is possible that the tail will already have a more severe laxity since different constraint paths can have identical tails. In addition, the laxity for the whole constraint path will become the original limit the instant after the first element appears. Therefore, the laxity becomes the original laxity minus the weight of the first element.

However, if  $a_k$  is <u>not</u> the start of an occurrence of  $C_{i,j}$ , the laxity should decrease by  $|a_k|$ . Fortunately, this problem will be handled automatically by assuming that an occurrence of  $C_{i,j}$  starts whenever  $a_k = c_{i,j}$ . If it is not part of an occurrence of  $C_{i,j}$ ,  $c_{i,j}$  will appear again before all of  $C_{i,j}$  appears. When this

happens, the laxity for  $C_{i,j+1}$  will have decreased by the amount the laxity for  $C_{i,j}$  should have decreased if the start of the path had not been incorrectly identified. When  $c_{i,j}$  appears again, the laxity for  $C_{i,j+1}$  will be less than the laxity for  $C_{i,j}$ . Therefore:

$$a_{k}=c_{i,j} \Rightarrow \begin{cases} d[i,j+1,k+1] = min(d[i,j,k],d[i,j+1,k]-|a_{k}|) \\ d[i,j,k+1] = l_{i}-|C_{i,j}|-|a_{k}| \end{cases}$$
(3-3)

Equations (3-2) and (3-3) can be transformed to produce rules for computing the k+1st column of the laxity table from the kth column:

$$d[i,j,k+1] = \begin{cases} 1_{j} - |C_{i,j}| - |a_{k}| & \text{if } a_{k} = c_{i,j} \\ min(d[i,j-1,k], d[i,j,k] - |a_{k}|) & \text{if } a_{k} = c_{i,j-1} \\ d[i,j,k] - |a_{k}| & \text{if } a_{k} \neq c_{i,j}, c_{i,j-1} \end{cases}$$
(3-4)

As an example, figure 3-1 shows the laxity table for the control structure ABCD and the block diagram module from figure 2-3.

In this table, the laxities at time 60 are identical to the laxities at time 30. The next column in the table would be identical to the column at time 40. The rest of the table becomes periodic, and all the entries are non-negative. The periodicity allows us to prove that  $(ABCD)^{**}$  will satisfy the latency specifications for all time. This is formalized in the following lemmas:

Lemma 3-2: If:

$$\forall_{i,j} d[i,j,m] \ge d'[i,j,k]$$
 and  $a_k = a'_m$ 

then:

|    | (O) | B<br>(10) | C<br>(15) | D<br>(25) | A<br>(30) | B<br>(40) | C<br>(45) | D<br>(55) | A<br>(60) | (70) |
|----|-----|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|------|
| AB | 30  | 20        | 15        | 5         | 0         | 20        | 15        | 5         | 0         |      |
| В  | 40  | 30        | 35        | 25        | 20        | 0         | 35        | 25        | 20        |      |
| CD | 30  | 20        | 15        | 20        | 15        | 5         | 0         | 20        | 15        | S    |
| a  | 40  | 30        | 25        | 15        | 35        | 25        | 20        | 0         | 35        |      |
| AD | 45  | 35        | 30        | 20        | 15        | 35        | 30        | 20        | 15        |      |
| D  | 55  | 45        | 40        | 30        | 50        | 15        | 10        | 0         | 60        |      |

$$t_A = 10 t_B = 5 t_C = 10 t_D = 5$$
  
 $t_{AB} = 45 t_{CD} = 45 t_{AD} = 60$ 

Typical Laxity Table Figure 3-1

$$\forall_{j,j} d[i,j,m+1] \geq d'[i,j,k+1]$$

Proof: From case analysis of (3-4) and elementary algebra.

Lemma 3-3: Let:

$$\alpha = a_1 \cdots a_{k-1}$$

$$\beta = a_k \cdots a_{m-1}$$

$$\gamma = a_m \cdots$$

If w - aby satisfies the latency specifications and:

$$\forall_{i,j} d[i,j,k] - d[i,j,m]$$

then:

also satisfies the latency specifications.

Proof: Construct the laxity table d' for w':

$$V_{i,j} d[i,j,0] = i_i - [C_{i,j}] - d[i,j,0]$$

Since a 1 - a'1, (3-4) leads to:

$$V_{i,j} d'[i,j,1] = d[i,j,1]$$

Similarly:

$$\forall_{i,j,l \le m} d^{r}[i,j,l] = d[i,j,l] \ge 0$$
 (3-5)

Therefore

$$V_{i,j}d'[i,j,m] = d'[i,j,k]$$

From lemma 3-2:

$$\forall_{i,j} d^{*}[i,j,m+1] \ge d^{*}[i,j,k+1]$$
  
=  $d[i,j,k+1] \ge 0$ 

Similar reasoning will show:

$$\forall_{i,j} d^{i}[i,j,2m-k-1] \ge d^{i}[i,j,m-1]$$
  
=  $d[i,j,m-1] \ge 0$ 

Now  $a'_{2m-k} = a_m$ , so lemma 3-2 still applies:

$$\forall_{i,j} d'[i,j,2m-k] \ge d[i,j,m] \ge 0$$

Inductively:

$$\forall_{i,j,l\geq m} d[i,j,l+m-k] \geq d[i,j,l] \geq 0$$
 (3-6)

Combining (3-5) and (3-6):

$$\forall_{i,j,l} d[i,j,l] \geq 0$$

Therefore, from lemma 3-1, w' satisfies the latency specifications. ■

<u>Corollary:</u> Let  $\alpha = a_1 \cdots a_{k-1}$ ,  $\beta = a_k \cdots a_{m-1}$ , and  $\gamma = a_m \cdots$ . If  $\omega = \alpha \beta \gamma$  satisfies all the latency specifications and d[i,j,k] = d[i,j,m] for some k < m, then  $\alpha \beta^k$  also satisfies the latency specifications. The proof is by induction.

The main theorem can now be proved by showing that any laxity table will have duplicate columns and applying lemma 3-3:

<u>Theorem 3-4:</u> If any string  $\omega$  satisfies the latency specifications then there exists a string of the form  $\beta^*$  which also satisfies the latency specifications.

Proof: Construct the laxity table for  $\omega$ . There are a finite number of possibilities for each table entry since each entry is  $\mathbf{l}_i - |\mathbf{C}_i|$  minus a sum of a finite number of  $|a_k|$ 's. The number of different  $|a_k|$ 's is limited by the number of blocks in the block diagram schema. The number of terms in the sum must be finite since each  $|a_k|$  is greater than zero and the laxity entry is also greater than or equal to zero. Therefore, the possibilities for each column are limited and eventually some column in the table will be repeated and k and m satisfying the conditions of lemma 3-3 exist.

Applying the corollary to lemma 3-3 says a solution of the form  $a\beta^*$  exists. However,  $d[i,j,1] = I_j - |C_{i,j}| \ge d[i,j,k]$ , for all k (the rules for filling in the table never increase the laxities except to set d[i,j,k] to  $I_j - |C_{i,j}|$ . Applying lemma 3-2 shows that  $\beta^*$  is also a solution.

The major implication of this theorem is that only cyclic strings need to be considered for static control structures. These strings can be enumerated, so the problem of finding a static control structure is in principal solvable. Since the proof also places an upper bound on the length of the cycle (equal to the total number of possible laxities at any position), so an algorithm that generated all possible strings would be effective in the sense that it would always halt in a finite amount of time. However, it would require computation time that grows exponentially with the complexity of the schema, so the problem would be computationally intractable if this were the only algorithm.

# 3.2: Generating Real-Time Control Structures

The problem of generating a feasible control structure is a scheduling problem. The problem is deterministic since the parameters of the problem are strictly bounded as opposed to being unbounded random variables. A wide varieties of special cases of the general scheduling problem have been studied, and some results are surveyed by Gonzalez [7], though relatively little work has been done

on scheduling in the presence of deadlines.

Gonzalez and Soh developed a simple algorithm that minimizes the number of processors used to schedule independent tasks. The tasks are statically assigned to processors and always run to completion. The deadlines for each task correspond to the period of the requests for that task and must be a power of two. Their algorithm is not optimal if the periods are not a power of two and no optimal algorithm is known, although several heuristic algorithms have been investigated.

Liu and Layland considered the problem of scheduling independent tasks on a single processor [14]. Each task requests service periodically with a deadline for service coinciding with the time for the next request. They present a method of assigning static priorities to the tasks that will meet the deadlines if any static assignment of priorities will. In addition, they prove the schedule which executes the task whose deadline is earliest is optimal in the sense it will meet the deadlines if any schedule will. They then prove necessary and sufficient conditions for a set of tasks to be scheduled by the earliest deadline (ED) algorithm to meet all its deadlines, and conclude that ED algorithm allows 100% utilization of the processor as opposed to figures as low as 70% for static priority algorithms.

Geiger extended the proof of the optimality of ED scheduling to include the case were the requests are not periodic [6]. Fiala presented the same basic proof and also derived necessary and sufficient conditions for the ED scheduler with a mix of periodic and aperiodic tasks [5].

Mok investigated scheduling independent tasks on multiple identical processors [16]. Mok shows that no optimal algorithm exists for this problem unless the

deadlines, computation times and at least some future request times are known. An algorithm related to the ED algorithm is presented which is shown to be optimal if all requests are simultaneous. This algorithm executes those tasks with the least laxity, where the laxity of a task is the deadline for the task minus its remaining computation time. Unfortunately, both the least laxity and ED schedulers are shown to be non-optimal even for tasks with periodic requests. However, the least laxity scheduler is optimal for periodic deadlines where tasks may be executed at any time (i.e. if the deadlines are coincident with the next request, the least laxity scheduler is optimal if it is allowed to execute tasks before they have been requested).

The problem of scheduling tasks related by a partial order on multiple identical processors has been studied by Manacher [16]. Deadlines are specified for any or all tasks in the system. Manacher's algorithm derives deadlines for all tasks in the system by using the observation that a task must complete executing in time to allow its successors to executed before their deadlines. The scheduler then executes those tasks with the earliest deadlines that have had all their predecessors executed. This algorithm is not optimal, and does not consider either periodic requests or multiple start-times. However, it is a reasonable heuristic, especially as the number of processors increase.

Unfortunately, none of these results generalize to the static control structure problem, even for a single processor, although control structures could be constructed which would meet the conditions of the particular special case and satisfy the real-time constraints. For example, if the block diagram consisted of unconnected (independent) blocks, the earliest deadline scheduler could be used

with task i being block i and the request period for each task being the minimum of  $I_i / 2$  and  $1 / B_i$ . The period between requests would have to be less than  $I_i / 2$  since (in the absence of other information) it is possible for the task to be executed immediately after one request and immediately before the following deadline. Lemma 3-1 says this time interval must not be greater than  $I_i$ .

On the other hand, these heuristics areliable to be overly restrictive, particularly since they tend to deal with independent tasks. It would be possible to derive independent tasks from a block diagram schema by treating the constraint paths as independent, but at the cost of introducing new blocks and much unnecessary computation. One promising approach for deriving a static control structure is to simulate some more general control structure until a cycle in the trace of that control structure is found. An obvious choice of a more general control structure is a least laxity scheduler (using laxities as defined for block diagram schema) which follows the partial order for the tasks (blocks) based on the constraint paths. More precisely, the scheduler would build a laxity table, with starred entries indicating constraints strings which cannot be fired because of the partial order. The scheduler chooses the first block of the unstarred constraint string with the smallest laxity to head the next column. If two constraints have the same laxity, either can be fired next. Figure 3-2 shows such a laxity table for the block diagram schema from figure 2-3 using the same latency specifications as figure 3-1.

At time 40, none of the latency specifications have been violated. However, since there are now two constraints with laxity 0, at least one entry in the next column will be negative. By firing C at time 10, an additional request for C is

|    | (O) | C<br>(10) | D<br>(20) | B<br>(25) | A<br>(30) | ? (40) |  |
|----|-----|-----------|-----------|-----------|-----------|--------|--|
| AB | 30  | *20       | *10       | *5        | 0         | *20    |  |
| В  | 40  | 30        | 20        | 15        | 35        | 0      |  |
| CD | 30  | 20        | *20       | 15        | 10        | 0      |  |
| D  | 40  | 30        | 20        | 35        | 30        | 20     |  |
| AD | 45  | *35       | *25       | 20        | 15        | *35    |  |
| D  | 55  | 45        | 35        | 50        | 45        | 15     |  |

$$t_A = 10 \ t_B = 5 \ t_C = 10 \ t_D = 5$$
 $l_{AB} = 45 \ l_{CD} = 45 \ l_{AD} = 60$ 

Counter-Example to Least Laxity Scheduling Figure 3-2

created with deadline 50. In the control robotics environment, the existence of this request makes scheduling impossible. However, if B is fired and C is delayed until time 15, the additional request also gets delayed to a point where it is possible to schedule all the requests. The least laxity algorithm simply does not deal with interactions between requests and deadlines.

It is interesting to note that the least laxity scheduler fails for this even if the constraint path AD is ignored. The remaining constraint paths AB and CD are independent, yet they cannot be scheduled using the ED algorithm using the worst-case period of  $I_i / 2$ . If periods are kept at  $I_i - |C_i|$ , the tasks still cannot be scheduled by the ED scheduler if the individual blocks are scheduled separately. The failure in this case can be viewed as an inability of the ED scheduler to derive the proper phase relation between the tasks.

The schedule shown in figure 3-3 is not the only least laxity schedule. For example, at time  $25 \, CD$  has the same laxity as B and therefore C could be fired instead of B. However, the reader can verify that all the least laxity schedules for

this example fall to satisfy the latency specifications.

### 3.3: A Branch-and-Bound Method for Generating Control Structures

Rather than generating acyclic control structures and looking for a cycle, the algorithm described in this section works by generating a cyclic control structure that satisfies the real-time specifications for one of the constraint paths. The solutions for other constraints paths are combined to form a control structure that satisfies all the real-time specifications. The basic semantics of firing blocks rules out control structures that are not shuffles of the constraint paths since these control structures perform redundant computations. Therefore, this algorithm should not miss any solutions. There are two major problems that the algorithm has to deal with: (1) How many times must each constraint path appear in one cycle of the total control structure. (2) How should the constraints paths be combined into one cycle.

#### 3.3.1: Determining the Relative Frequency of Constraint Paths

The first step in the algorithm is to determine how many times each constraint appears in one cycle of the total solution. Upper and lower bounds can be derived from the length of the cycle and the basic latency specification. Consider the lower bound on the number of appearances of constraint i: let  $k_i$  be the number of appearances of  $C_i$  in one cycle of the solution  $\omega^K$ . Let  $W_i = |C_i|$  and  $C_i = |\omega|$ . Since the latency specification for  $C_i$  requires  $C_i$  to appear at least once every

1,-w, time units:

$$k_j \ge \frac{c}{l_j - w_j} \tag{3-7}$$

This leaves c (the length of the cycle) to be determined. However, if  $C_i$  appears  $A_i$  times:

$$c \ge k_i w_i$$
 (3-8)

More precisely, the algorithm starts with the assumption that each block and constraint appears once and that  $c = \sum_{i=1}^{n} A_i$ . This approximation is used to derive  $k_i$  for all constraints in the schema. If any  $k_i$  increases, this is used to update the minimum number of times each block in the constraint must appear, which in turn may cause c to increase. This process continues until all  $k_i$  are consistent with c. In practice, this only takes a few iterations.

Theorem 3-4 places an upper bound on the number of blocks in a cycle, but this bound is not directly applicable to the branch and bound algorithm since the branch-and-bound algorithm does not try all cycles of a given length. An upper bound on the number of appearances of any constraint can be easily derived if the number of appearances of the other constraints is held constant.

First, an upper bound on the length of a cycle can be derived by applying equation 3-7 to all constraints except constraint i. Then the minimum weight of a cycle containing  $k_j$  appearances of  $C_j$  can be computed for all  $i \neq j$ . Letting  $c_{max}$  be the maximum allowed cycle weight and c be the minimum cycle weight (not including constraint i), the minimum weight of a cycle containing  $k_j$  appearances of

Determining the Relative Frequency of Constraint Paths

Section 3.3.1

C, is:

$$c + k_j w_j \tag{3-9}$$

Therefore, the upper bound on  $k_i$  can be derived by restricting the resultant cycle weight to be less than  $c_{max}$ :

$$k_i \le \frac{c_{max} - c}{w_i} \tag{3-10}$$

This ignores the possibility of blocks in  $\mathbf{C}_i$  already appearing in the cycle as part of other constraints. However, including more appearances of constraint i will eventually cause the minimum cycle length to exceed  $c_{max}$ .

This still does not bound the number of appearances for all constraints, since constraint i can appear more often if constraint j appears more often, etc. Placing an arbitrary bound on one constraint will also bound the number of appearances of all other constraints. For example, requiring at least one constraint to appear only once places a fairly tight bound on all constraint. However, it is not true that a solution of this type always exists. An example is shown in figure 3-3.

# 3.3.2: Strategies for Combining Solutions

Once the number of appearances per cycles of each constraint path is known, the constraint paths can be permuted to form a control structure which satisfies all the real-time specifications. Many of the techniques for improving the efficiency of 'branch-and-bound' optimization algorithms can be applied to this problem even though it is not an optimization problem. An optimization problem seeks a



$$t_A - t_B - t_C - t_D - t_E - t_F - 1$$
 $t_{a,g} \le 11$ 
 $t_{a,j} \le 15$ 
 $t_{b,j} \le 7$ 
 $t_{c,j} \le 10$ 

Control Structure: (ABFDECBFADEBFCF)\*

Block Diagram Where All Constraints Appear More Than Once Figure 3-3

permutation of n objects that maximizes an evaluation function f of the permutation.

A 'branch-and-bound' algorithm for this problem generates permutations for a

subset of the objects and extends these permutations to larger subsets. The permutations to the subsets are called partial solutions, and are arranged in a tree. Nodes in the tree correspond to partial solutions and the descendents of a node are the extensions of that partial solution. Branch-and-bound algorithms are often more efficient than direct enumeration since it is often unnecessary to examine the entire search tree. The key to pruning the search tree is the dominance relation on nodes of the tree. The evaluation function f can be extended to arbitrary nodes of the search tree by defining the value of a non-terminal node to be the maximum value of its descendants. Then node A dominates node B if and only if f(A) > f(B). The branch-and-bound algorithm may prune any subtree whose root node is dominated by some node of the tree that has already been explored.

In general, the dominance relation for a particular optimization problem cannot be computed without examining the entire tree. However, it is often easy to compute some weaker relation. These weaker relations are usually referred to as dominance relations in the literature, so we will use the term strong dominance relation to refer to the dominance relation that relates A to B if and only if f(A) > f(B).

Branch-and-bound algorithm vary in the order the tree is searched and how the dominance relations used to prune the search tree. Kohler and Steiglitz classified branch-and-bound algorithms and initiated the theoretical study of dominance relations [13]. They demonstrated the surprising result that pruning based on a stronger dominance relation does not always improve the efficiency of the algorithm. However, Ibaraki showed that stronger dominance relations do lead to more efficient

algorithms for several common classes of branch-and-bound algorithms [10].

Branch-and-bound algorithm as defined by Kohler and Steiglitz also make use of a function g that places a upper bound on the value of f at each node. If L is the maximum f(A) for <u>leaf</u> nodes A encountered, pruning sub-trees with  $g(A) \le L$  can only improve the efficiency of the algorithm. However, the upper bound function can also be viewed as a particular dominance relation.

The control structure problem as stated is not an optimization problem. However, it is still possible to define a dominance relation between nodes of the search tree: node A strongly dominates node B unless B leads to a valid control structure and A does not. Assuming the nodes at each level are generated in a random (lexicographic) order, the best pruning for the algorithm to use is to retain the node at each level which dominates the other nodes. If this dominance relation can be easily computed, the algorithm can generate a valid control structure without backtracking.

As a first step towards computing a dominance relation, define the slack for each constraint to be the difference between the latency requirement and the latency actually achieved by the control structure. The constraint with the least slack is the most critical constraint (MCC). The slack in the MCC could also be used as a value function to be maximized. If no control structure satisfies the real-time constraints, the control structure maximizing the slack in the MCC is probably a good 'close' solution. Also, the slacks may be used to evaluate any heuristic algorithms for deriving control structures.

The latency achieved by a static control structure for a constraint  $C_i$  is the weight of the most critical window for  $C_i$ . Adding a block to the cycle of the

control structure cannot increase any slacks since the weight of some critical window will be increased. The only exception would be if the new block completes an additional occurrence of some constraint path, thereby creating new critical windows. This cannot happen if the blocks being added are elements of some other constraint path, since no constraint path is contained in another constraint path. Therefore, the MCC slack can be used as an upper bound function in a branch-and-bound algorithm to maximize the MCC slack. Upper bound functions are also often used to guide the search in branch-and-bound algorithms. For example, the algorithm could always expand the node with the greatest upper bound.

If the slacks in each constraint are reduced by the same amount when a new block is added to the cycle, then the partial solution with the greatest MCC slack would be a dominant solution. Unfortunately, this is not generally the case. Consider dividing a cycle  $\omega$  of the control structure  $\omega^*$  into regions  $\phi_{i,j}$  and  $\xi_{i,j}$ , as shown in figure 3-4. The  $\phi_{i,j}$  regions contain one occurrence of  $C_i$ , but  $[\phi]$  contains no occurrence of  $C_i$ . The critical windows of  $C_i$  are  $\phi_{i,j}\xi_{i,j}\phi_{i,j+1}$ . Therefore, adding blocks to a  $\xi_{i,j}$  region increases the weight of  $\psi_{i,j}$ , and adding blocks to a  $\phi_{i,j}$  region increases the weight of  $\psi_{i,j-1}$  and  $\psi_{i,j}$ . Even if  $[\psi_{i,j}]$  increases, the slack for  $C_i$  will not decrease unless  $[\psi_{i,j}] = \max_i |\psi_{i,j}|$ . The slacks can not be used to compute a dominance relation since the interdependence of constraint paths may force new blocks to be added within the most critical window of some constraint, while another solution with a smaller MCC slack might have a critical window of the right size in the right place.

$$|\leftarrow \quad \psi_{i,1} \quad \rightarrow |$$

$$|\leftarrow \quad \psi_{i,1} \quad |\leftarrow \quad \psi_{i,2} \quad |\leftarrow \quad \psi_{i,2} \quad \rightarrow |$$

Regions of a Critical Window Figure 3-4

Keeping vectors of slacks for each constraint path does not correct the problem. Consider the example shown of figure 3-3 with the latency specification as shown in figure 3-6. It can be easily verified that  $(ADEFCADBC)^*$  is a feasible control structure for this schema. It is also the only feasible control structure 1. AD and CF must appear at least twice in one cycle of the solution. Figure 3-5 shows slacks for this constraints for two partial control structures. The merging of  $(ADAD)^*$  and  $(CFCF)^*$  that leads to the solution is  $(ADFCADFC)^*$ . However, the slacks for CF in  $(ADCFADCF)^*$  are larger and the slacks for AD are the same, so  $(ADCFADCF)^*$  would dominate  $(ADFCADFC)^*$  even though it doesn't lead to a solution.

### 3.3.3: Performance of the Algorithm

Assume each constraint path contains an average of k blocks. The slack of a constraint path in a trial cyclic solution can be determined in at most k scans of the cycle. If there are n constraint paths there will be o(nk) scans of each trial solution generated by the algorithm. The trial cycles will be o(nk) blocks long (this

<sup>1.</sup> This was verified by checking all cyclic control structures that might be generated by a branch-and-bound algorithm assuming that the least critical constraint only appears once per cycle.

$$|a,g| \le 7$$
  
 $|a,j| \le 14$   
 $|b,j| \le 12$   
 $|c,j| \le 10$ 

| Control      | Slack      |    |     |    |  |  |  |  |
|--------------|------------|----|-----|----|--|--|--|--|
| Structure    | Constraint |    |     |    |  |  |  |  |
| wantalasi sa | AD         | CF | AEF | BF |  |  |  |  |
| ADFCADFC     | 1          | 2  | -   | -  |  |  |  |  |
| ADCFADCF     | 1          | 4  | -   | -  |  |  |  |  |

Counter-Example to Slack as a Dominance Relation Figure 3-5

ignores the possibility of a constraint appearing several times in one cycle). The overall time complexity of the algorithm will be  $o(n^2k^2)$  times the number of trial cycles generated per problem.

Assume the trial cycle contains  $m_1$  blocks and the next constraint path contains  $m_2$  blocks. There are  $(m_1+m_2-1)!$  cycles containing all the blocks, but we are only interested in one of the  $m_1!$  permutations of the blocks in the old cycle, and  $(m_2-1)!$  permutations of the blocks in the new constraint (i.e. we must consider  $m_1$  different phase relations of the two cycles). Therefore, the number of different trail cycles generated at this step is:

$$\frac{(m_1 + m_2 - 1)!}{m_1!(m_2 - 1)!} = m_1 \binom{m_1 + m_2 - 1}{m_1}$$
 (3-11)

Of course, if some blocks of the new constraint are already contained in the old cycle, or if the next constraint appears more than once, not all of the generated cycles will be distinct. However, it is rather difficult to avoid generating these

cycles. There will be relatively little extra cost to the algorithm as long as it does not investigate cycles that are identical to cycles that have already lead to fallures. Therefore, the number of trial cycles generated by the merging algorithm when it finds a solution without backtracking is approximately:

$$\sum_{l=2}^{n} k \binom{kl-1}{k} \tag{3-12}$$

Equation (3-12) is  $o(kn^{k+1})$  since the binomial term in the sum is  $o(n^k)$  and there are n terms.

If the merging algorithm fails to find a solution, then it must have backtracked through each trial solution and the total number of cycles generated is:

$$(1+k\binom{2k-1}{k})(1+\cdots(1+k\binom{nk-1}{k})\cdots)$$
 (3-13)

which can be approximated:

$$\prod_{i=2}^{n} k \binom{ki-1}{k} \tag{3-14}$$

Equation (3-14) is  $o((kn^k)^n)$  or  $o(k^nn^{kn})$ , and is exponential in the number of blocks in the schema. This is a very loose upper bound and would only be achieved if all generated solutions were plausible except when the last constraint was being merged in. However, this bound is achievable if the first n-1 constraint paths had relatively large latency specifications while the last constraint path had relatively small latency specifications. This situation can be easily avoided by starting with the path with the smallest latency constraints relative to the weight of the path.

#### 3.3.4:

### Speeding up the Algorithm

There are many ways the average performance of the algorithm could be improved. For example, if we had a tighter lower bound on the slack in the MCC, we could prune more subtrees. We can get a tighter bound by determining what new blocks must be added to the control structure. Adding a new block always increases the size of some critical window for a constraint by at least the weight of the block. Therefore, if the sum of the slacks for a constraint is less than the total weight of blocks that must be added to the control structure, at least one of the critical windows for that path will exceed the latency specification for that path. This tighter bound has no effect on the performance if no backtracking is necessary. However, if no solution is found, using the tighter bound is roughly equivalent to reducing n, since fewer constraints need to be combined before the control structure is recognized as infeasible.

Notice that the performance of the algorithm would not be of polynomial complexity even if there were a dominance relation that totally ordered the possibilities at each level. The problem is that the number of partial solutions that must be generated by a naive algorithm can grow exponentially with the complexity of the schema. Therefore, finding a good dominance relation is not as important as finding a search function that generates nodes that are most likely to lead to a solution first.

Since the weight of the critical windows increase when new blocks are added, we might try merging in new constraint paths so that no new blocks are added before trying more general mergings. This will improve the performance if the solution is an extension of this type of merging, even if the algorithm must

backtrack since fewer nodes are generated on that level. If the algorithm must backtrack through all the control structures of this type, the performance of the algorithm is somewhat worse. The effect of this heuristic may be approximated by reducing k, since the length of the strings merged into the current control structure will be reduced.

The other way of improving the performance of the algorithm is to reduce the complexity of the problem. This can be done by replacing sub-graphs of the block diagram module with new blocks. Whenever the new block is fired, the blocks comprising the subgraph replaced by the new block are fired in some fixed order. This replacement can dramatically reduce k, and would improve both the best- and worst-case performance. However, combining blocks in this way can result in a schema which has no feasible control structures even though the original schema does.

Since the process of generating a control structure can be so time consuming, it would be extremely useful to quickly identify real-time specifications that are impossible to satisfy. One way of doing this is to compute the percentage of CPU time required by each block. If the sum of this percentage over all blocks in the schema is greater than 100%, the latency specifications are obviously unsatisfiable.

The percentage of the CPU required by each block is easily computed: each constraint  $C_i$  must be executed at least once every  $I_i - |C_i| + \epsilon$  time units. Therefore, each block  $c_{i,j}$  in  $C_i$  must be executed at least once every  $I_i - |C_i| + \epsilon$  time units and its corresponding CPU percentage is:

$$\frac{|c_{i,j}|}{|i-C_j|+\epsilon} \tag{3-15}$$

If an block appears in several constraints, its CPU percentage is the maximum of the percentage implied by each constraint the block appears in. Using the maximum rather than the sum corresponds to assuming that each time the block is fired it will help satisfy all the constraints it appears in. Although this is not necessarily the case, it is a lower bound on the CPU usage.

Another quick test for unsatisfiable latency specifications is that the slack in each latency specification must be larger than the computation time for all blocks not contained in that constraint path. Otherwise, the \( \xi \) portion of some critical window for that constraint will be too large (refer to figure 3-4).

### 3.3.5: Practical Experience

A branch-and-bound algorithm similar to the one described above has been implemented as part of a system for implementing continuous-time block diagrams on conventional micro-processors. The implementation runs on a PDP-11/70 under the UNIX timesharing system. The block diagram is described using an interactive graphics editor developed by John Pershing [18]. The branch-and-bound algorithm is only responsible for choosing the order to execute the blocks. The object code for the block diagram is produced by a separate program.

The program uses all of the heuristics mentioned above except it does not combine sub-graphs into new blocks. The program is able to find control structures to satisfy most latency specifications for small block diagrams using less than a minute of CPU time. So far, only one set of latency constraints has been found

where a valid control structure exists but no control structure was found by the program (see figure 3-3). Some latency specifications require more time to find a valid control structure.

In the absence of a fast optimal algorithm, it is preferable to have a fast algorithm which yields 'good' control structures quickly. Heuristic algorithms are generally evaluated one of two ways: one approach chooses a fixed algorithm and derives an upper (or lower) bound on how far the algorithm's solution is from the optimal solution. For example, Graham's algorithm for scheduling independent tasks on multiple processors executes tasks which require more processing time first. The resulting schedule is no more than 4/3 times as long as the optimal schedule [8].

The other approach develops a family of algorithms each requiring polynomial time. As the degree of the polynomial increases, the solutions found by the programs are closer to optimal. The family of algorithms is monotonic in the sense that the an algorithm taking more time never produces a poorer solution than one taking less time. If the degree of the polynomial were increased to infinity the algorithm would be optimal. However, it would also no longer be polynomially time bounded. An example is a series of scheduling algorithms employing limited lookahead [1].

The second approach does not seem applicable to the control structure problem. Limiting the breadth of back-tracking yields a family of exponential time algorithms with the exponent increasing with the amount of back-tracking. A family of polynomial algorithms would result if at most k blocks were merged at a time with no backtracking. However, these algorithms are very unsatisfactory if any

constraint must appear more than once. If the number of blocks in the constraint path is less than k, than all blocks for the second (and subsequent) appearance of the constraint will be merged coincident with the existing occurrences of those blocks. If k is increased so this does not happen, the performance of the algorithm is only slightly better than the complete algorithm with no backtracking.

## 3.4: Heuristics for Generating Control Structures

Steve Ward has experimented with some quick, simple heuristics for generating static control structures. Basically, the heuristic constructs control structures of the form  $(\alpha\beta\alpha\gamma\alpha\delta\cdots)^{*}$  where  $\alpha$  is the most critical constraint path and  $\beta$ ,  $\gamma$ ,  $\delta$ , et cetera are taken from the other constraint paths. More specifically, blocks from the next most critical constraint are added to  $\beta$  with the restriction that  $|\alpha\beta\alpha|$  is less than  $|\alpha|$ . If more blocks remain in the constraint they are added to  $\gamma$  so that  $|\alpha\gamma\alpha|$  is less than  $|\alpha|$ . Once all constraints have been merged in this way, the latency specifications are checked. If they are all satisfied then the generated string is a feasible control structure.

The heuristic will also call itself using the current solution as a so the generated solution may also be of the form:

Since these heuristics construct a control structure rather than search for one, they run very quickly. However, they also do not find solutions to a fairly large number of latency specifications, even for simple block diagrams. Still these heuristics are more attractive as a basis for an approximate algorithm, not only

because of their speed but also these heuristics could be extended to handle particular styles of block diagrams as the process of constructing control structures becomes better understood.

### 4: Static Priority Interrupt Control Structures

In some applications, the tokens at the input links do not change continuously. If the control structure can detect when an input changes, the real-time performance can be improved. Intuitively, this is possible since if no inputs to a block have changed, that block does not need to be executed. On the average, this type of control structure ought to do less computation and therefore ought to have better real-time performance. On the other hand, better average performance does not guarantee better worst-case performance and specific questions of performance must be answered with respect to a particular model.

Although the prototypical example of a dynamic control structure is interrupt driven, it is important to realize that hardware interrupts are not necessary. For example, a control structure could sample the inputs until one or more inputs change. After all the computation initiated as a result of these changes had completed, the control structure would continue to sample the inputs. In general, such a scheme would risk missing changes in the inputs. However, the control structure can use the real-time specifications to guarantee this will not happen.

#### 4.1: Dynamic Control Structures

Many of the strategies for scheduling independent tasks to satisfy real-time constraints mentioned in the previous chapter use dynamic control structures. For example, Liu and Layland use static priority interrupts and consider the case (in our terms) where the latency is equal to the period between requests [14]. They consider the earliest deadline scheduler only in this context although the earliest deadline schedule is optimal for any sequence of requests and deadlines, as mentioned earlier.

Given an optimal scheduler, is there any reason to consider a suboptimal

scheduler? The answer will be yes if a good suboptimal scheduler exists which uses less resources than the optimal scheduler. The earliest deadline scheduler needs to find the highest priority task to execute whenever a task completes (alternately, it needs to insert requests into the proper position in a task queue). A static priority interrupt control structure also needs to find the highest priority task to execute. However, this is done in hardware by many existing computers, including current microcomputers. Also, the earliest deadline scheduler requires a real-time clock to compute the deadlines for each task from the request time and the latency specification. Therefore, static interrupt control structures are sufficiently simpler than a earliest deadline control structure to deserve further consideration.

### 4.2: Model for Static Interrupt Control Structures

A static interrupt control structure associates a task with each block in the diagram. The tasks are related by a precedence relation consistent with the block diagram. Each task has a priority and may be idle, active, or requested. The priority may be thought of as an integer with numerically greater priorities being better.

When an input changes, all tasks whose blocks are watchers of that input become requested. The control structures chooses the task with the highest priority among the requested tasks. This task is active until the block complete executing when all its successor tasks become requested and the task itself becomes idle. If the control structure allows active tasks to be suspended while another task is executed the control structure is call preemptive. Otherwise it is

non-preemptive. Unless otherwise noted control structures are assumed to be preemptive.

The latency performance of any static interrupt control structure can be determined for each task by adding the computation time for that task to the maximum computation time used by higher priority tasks while the task is on the ready queue. The difficulty in this analysis is in determining how much computation might be used by other tasks.

The simplest case to consider is when all the tasks are *independent* (each task consists of exactly one block). Each task i requires  $t_i$  units of computation; and has priority  $p_i$ , latency  $l_i$ , and bandwidth  $B_i$ . Without loss of generality, the tasks can be numbered so that:

The overhead of associated with interrupts, selecting a task for execution, etc. will be ignored for the time being. We shall also assume that all priorities are distinct.

The latency for task / when its inputs change discretely is simply the maximum elapsed time between a change in an input and the termination of the task. This must be less than I, if the latency specification for task / is satisfied. The interpretation of the bandwidth specification is also simplified. Instead of specifying a minimum rate for <u>sampling</u> inputs, the bandwidth specifies the maximum rate at which an input changes.

The latency specification for task i will be satisfied if and only if the block for task i can be completely executed during any time interval of duration i. During

this interval, tasks with priority better than  $p_i$  will also be run, and the amount of CPU time used by higher priority tasks must be less than  $l_i - t_i$ .

Notice that this model is equivalent to the model used by Fiala. Fiala's  $P_i$  corresponds to  $t_i$ ,  $D_i$  corresponds to  $l_i$ , and  $l_i$  corresponds to  $1/B_i$ . Therefore, for a single processor we have the obvious restrictions:

$$t_j \le l_j \le \frac{1}{B_j} \tag{4-1}$$

and:

$$\sum_{j=1}^{n} \mathbf{B}_{j} \mathbf{t}_{j} \le 1 \tag{4-2}$$

The summands in (4-2) are the fraction of CPU time used by task i. Obviously the total fraction of the CPU used by all the tasks must be less than one. Equation (4-1) can be derived from (4-2).

Lemma 4-1: The amount of CPU time used by n independent tasks using a static priority scheduler in a window of duration  $\Delta t$  does not depend on the relative priority of the tasks.

<u>Proof:</u> The processor is always busy if some task is requesting service.

Changing the priorities of the tasks will never cause the processor to remain idle when some task requests service, nor will it affect when the tasks request service.

Since the control structure only executes a task if some input to the task changes, task / cannot be executed more often than once every 1/B, time units.

Clearly, a task uses the maximum CPU time if any interval if it requests service at this maximum rate.

Assume task i requests service at times 0,  $1/B_i$ ,  $2/B_i$ ,  $\cdots$ , and let  $C_i(t)$  be the maximum amount of CPU time used by task i in the interval (0, t). The highest

priority task (task 1) always starts executing immediately after it requests service and executes for  $t_1$  time units, so it will be executed  $\begin{bmatrix} B_1t \end{bmatrix}$  complete times in the interval. Let  $r = t - \begin{bmatrix} B_1t \end{bmatrix}$  be the amount of time at the end of the window after the last request for task 1. Task 1 will be executing during the interval (t-r,t) since task 1 has the highest priority. However, if  $r > t_1$ , only  $t_1$  units of computation will be used so:

$$C_{1}(t) = \left[B_{1}t\right]t_{1} + min\left(t_{1}, t - \frac{\left[B_{1}t\right]}{B_{1}}\right) \tag{4-3}$$

The maximum amount of CPU time used by task 1 in the interval ( $\Delta t$ ,  $t+\Delta t$ ) is:

$$C_1(t+\Delta t) - C_1(\Delta t)$$
 (4-4)

We will show that this is maximized when  $\Delta t = 0$  by showing:

$$C_1(t+\Delta t)-C_1(\Delta t) \leq C_1(t)$$

or

$$C_1(t+\Delta t) - C_1(t) \le C_1(\Delta t) \tag{4-5}$$

Since the requests for task 1 occur with a regular period,  $C_1(t)$  is also periodic. In fact:

$$C_1(t+1/B_1) = C_1(t) + t_1$$
 (4-6)

Therefore, we need only consider  $\Delta t$  between 0 and  $1/B_1$ , in which case:

$$C_{1}(\Delta t) = min(t_{1}, \Delta t) \tag{4-7}$$

This is the maximum amount of CPU time used by any interval of duration  $\Delta t$  since the CPU time used cannot be greater than the duration of the interval nor

can it be greater than  $t_1$  if the interval contains less than one period. Therefore, the inequality in (4-5) holds since the left hand side is the amount of CPU time used in an interval of duration  $\Delta t$  starting at t.

The worst case for a set of tasks will occur when all tasks request service at time 0 and continue requesting service at their respective maximum rates. This is true since the highest priority task will use its maximum amount of CPU time under these conditions, and by lemma 4-1, any task can be made the highest priority task without affecting the amount of CPU time used by the set of tasks.

Define  $C_{i}(t)$  by:

$$C_{i}(t) = \left[B_{i}t\right]t_{i} + min\left[t_{i}, t - \frac{\left[B_{i}t\right]}{B_{i}}\right]$$

The amount of CPU time used by tasks j and k is not necessarily  $C_j(t)$  summed over j and k. The difficulty is that if requests for tasks j and k occur sufficiently near the end of the window and of each other then only the higher priority task will actually be executed. Therefore, it is necessary to determine a precise schedule for the interval from 0 to t. However, if we are only interested in how much CPU time is used in this interval, lemma 4-1 assures us that we may assign arbitrary priorities to tasks j and k.

However, a sufficient condition for satisfying the latency specification for task /

$$l_{j} \ge t_{j} + \sum_{j=1}^{j-1} C_{j}(l_{j})$$
 (4-8)

This equation can be made more intuitive if the time required by task j is approximated by:

Model for Static Interrupt Control Structures

Section 4.2

Then equation (4-2) becomes:

$$I_{j} \ge t_{j} + I_{j} \sum_{j=1}^{j-1} B_{j} t_{j}$$
 (4-10)

This can be rewritten as:

$$I_{j} \ge \frac{t_{j}}{\int_{j=1}^{j-1} B_{j} t_{j}}$$
 (4-11)

The denominator in equation 4-11 represents the fraction of CPU time available to task i. The effect of higher priority tasks is equivalent to reducing the CPU speed.

# 4.3: Assigning Priorities to Independent Tasks

One of the weaknesses of traditional real-time operating systems based on static priority scheduling is that the system does not verify that the priorities assigned by the user are consistent with his real-time specifications. Even if the system checked these specifications, the user still must assign priorities, which do not have a simple relation to the real-time specifications. The obvious strategy of assigning the highest priority to the task that requires the fastest response time does not work. Consider the example in figure 4-1. Either task 1 or task 2 can run at the best priority since  $l_j \ge t_j$ . If  $p_j = 1/l_j$ , then  $p_1 > p_2$  and the the latency for task 2 is:

$$t_{2} + \left[ l_{2}B_{1} \right] t_{1} + min \left( t_{1}, l_{2} - \frac{\left[ l_{2}B_{1} \right]}{B_{1}} \right)$$

$$= 12 + \left[ \frac{16}{4} \right] 2 + min \left( 2, 16 - \left[ \frac{16}{4} \right] 4 \right)$$

$$= 12 + 8 + min(2,0)$$

$$= 20 \le l_{2} - 16$$

However, the latency for task 1 if  $p_2 > p_1$  is:

$$t_{1} + \lfloor l_{1}B_{2} \rfloor t_{2} + min \left( t_{2}, l_{1} - \frac{\lfloor l_{1}B_{2} \rfloor}{B_{2}} \right)$$

$$= 2 + \left\lfloor \frac{15}{24} \right\rfloor + min \left( 12, 15 - \left\lfloor \frac{15}{24} \right\rfloor 24 \right)$$

$$= 2 + 0 + min (12, 15)$$

$$= 14 \le l_{1} = 15$$

$$t_1 = 2$$
  $B_1 = \frac{1}{4}$   $I_1 = 15$   
 $t_2 = 12$   $B_1 = \frac{1}{24}$   $I_1 = 16$ 

## Counter-example to priority = 1 / latency Figure 4-1

The algorithm successively finds a task that can satisfy its latency specifications while assigned the lowest priority. If there are several such tasks, choose one arbitrarily. This task is assigned the lowest priority and removed from the set of tasks. The next task selected will be assigned a priority higher than all previously assigned priorities but lower than all tasks still unassigned. This continues until no task remains or no task can be found that can execute at a

priority lower than all other tasks. In this case, no assignment of static priorities will satisfy all the latency specifications using only one processor. This algorithm will never make a bad choice. Consider the situation when one or more tasks remain yet no task can be assigned the lowest priority. Any task that could possibly run at a lower priority has already been assigned a lower priority.

### 4.4: More Complex Models

The model for static interrupt control structures made several simplifying assumptions, such as ignoring scheduling overhead, assuming preemptive scheduling and distinct priorities. The model can be easily changed to account for different assumptions.

## 4.4.1: Scheduling Overhead

When a task requests service, the control structure must compare the priority of the task with the priority of the currently executing task. If the priority of the current task is higher, then new request must be queued in some manner. When any task completes execution, the control structure must select a new task to execute. Also, switching the processor between tasks will generally involve setting up some processor registers. However, all of these actions will occur for every instance of a task requesting service, so these overhead costs can be included in the maximum CPU time used by task  $l = t_j$ . The basic algorithm of finding a task which can be assigned the worse priority while still satisfying (4-8) is still correct.

# 4.4.2: Non-preemptive Control Structures

If the currently executing task always runs to completion before a new task is run, then the latency specification for a task must be large enough to allow for any task with worse priority to execute as well as the CPU time used by tasks with better priority. Thus, (4-6) becomes:

$$l_{j} \ge t_{j} + \sum_{j=1}^{j-1} C_{j}(l_{j}) + \max_{j=j+1}^{n} (t_{j})$$
 (4-12)

Again, the assignment algorithm does not require any changes. This is obvious if the algorithm finds a valid assignment of priorities. Increasing the priority of some task relative to task / moves a task into the summation term in equation (4-12). Since  $C_j(t)$  is greater than or equal to  $t_j$ , making this change can only increase the right hand side of (4-12).

## 4.4.3: Non-Distinct Priorities

For various reasons it may be desirable to assign several tasks identical priorities. For example, the computer hardware may only support a limited number of interrupt priorities. Since the control structure is free to execute any of the requested tasks having the highest priority, all tasks having the same priority as task / must be treated as if they had higher priorities when checking the latency specifications. This assumes that the control structure only executes task / when all other requested tasks have priorities strictly worse than p<sub>j</sub>.

However, this also makes the often unrealistic assumption that a task can be preempted by a task with equal priority. If this is not the case it is necessary to

simulate the control structure on the worst case sequence of requests. It is not sufficient to treat these tasks as if they had lower priority but are not preemptible since a pair of tasks can make a sequence of requests so that one of them requests service again while the other is being executed. Therefore, the first task can be executed twice while task *i* is waiting for service although task *i* is never preempted.

### 4.5: Applications to the Control Structure Problem

Verifying the real-time performance of a static priority scheduler on more complex task structures is a straightforward extension of the verification for independent tasks. A latency specification I, is satisfied if and only if all blocks in the constraint path can always be executed during any interval of duration I,. It becomes slightly more complex to compute the amount of CPU time used by higher priority tasks since some tasks (blocks) will not be runnable when other tasks are requested.

# 4.5.1: Chains of Independent Tasks

If no block appears in more than one constraint path, the constraint paths can be treated as independent tasks. A task will never be interrupted by a request of a predecessor if the real-time specifications are met since the period between requests is not less than the deadline for any one request.

The priority assignment problem would be very much more difficult if it were necessary to consider assigning different priorities to individual blocks in a chain.

However, it does not make sense to assign lower priorities to some blocks in the constraint path, since it makes no difference where in the chain higher priority tasks are allowed to interrupt. Therefore, all the tasks in the chain can be assigned the same priority as the task in the chain with the least priority.

In the presence of overhead it is more efficient to create one 'super-task' that executes all the blocks consecutively rather than incurring the overhead of a request for each block in the chain. However, if the control structure is non-preemptive it may be necessary to create several smaller 'super-tasks' to reduce the amount of time that must be spent waiting for low priority tasks to complete. Deciding how many tasks to create and how large to make them could be made on the basis of how much CPU time needs to be freed up in order to find a task to assign the currently worst priority.

#### 4.5.2: More Complex Task Relations

There are fundamentally two ways different constraint paths can have a common block: the common block can have more than one successor or it can have more than one predecessor. We will first consider the simplest example of each type of interdependent constraints.

Consider a block diagram in which block A has successors B and C. The constraint paths for this diagram are AB and AC. Since a request for A will always cause requests for both B and C,  $B_{AB} = B_{AC}$ . Therefore, neither B nor C will be interrupted by requests for A as long as the real-time specifications are met.

Now, if  $P_B > P_C$  then the sequence of blocks executed whenever A is requested

is ABC. Otherwise the sequence ACB will be executed. We can therefore replace the tasks A, B, and C by a task that executes either ABC or ACB. The latency specification for the new task should be chosen so that it will be satisfied if and only if the original latency specifications are satisfied. These latency specifications are satisfied if and only if:

$$I_{AB} \ge t_A + t_B + (time lost to interrupts)$$
 (4-13)

and

$$I_{AC} \ge t_A + t_C + (time lost to interrupts)$$
 (4-14)

The CPU time used by interrupting tasks will be identical for both the ABC and ACB sequence, except if ABC is executed, then B must be considered an interrupting task in equation (4-14), and similarly for C and equation (4-13). Therefore:

$$I_{ABC} = min(I_{AB}, I_{AC}^{-1}B)$$
 (4-15)

and

$$I_{ACB} = min(I_{AC}, I_{AB} - t_C)$$
 (4-16)

and we should choose the sequence that yields the greater latency.

Now consider a block diagram in which C has two predecessors A and B. The constraint paths for this block diagram are AC and BC. It is also quite possible to receive a request for C while C is already requested or suspended. However, If C was first requested by A, the additional request will always be from B and vice versa. If this occurs the logical thing to do is to have C executed only once, but in general the sequence AC will be executed whenever A is requested and BC will be requested whenever B is requested.

It is sufficient to replace A, B, and C by two tasks which executed AC and SC

respectively, ignoring the possibility that at times C may not need to be executed by one of the tasks. However, if no assignment of priorities is found treating these tasks as independent, it is not necessarily true that no such assignment would exist if the common block C were handled more carefully. The difficulty is that the worst case sequence of requests becomes harder to construct.

# 4.5.3: Combining Static and Dynamic Control Structures

Rather than having the processor idle when no tasks are requested, it may be possible to have the processor executing a static control structure for some portion of the block diagram. In this case we would consider the static control structure to be the lowest priority task. There are no real-time specifications on this task in the usual sense, although we must still guarantee the latencies in the static control structure. This can be done by modifying the latency specifications so that even when the maximum amount of CPU time is used by the dynamic tasks, the static control structure still runs often enough.

Consider a latency specification I, for C,. The blocks in C, must be executed once in every interval of duration I,. The trace of the processor is no longer completely determined by the static control structure since the dynamically scheduled tasks will interrupt the static control structure. However, the amount of CPU time used by these tasks is known. Therefore, we need only choose new latency specifications for the statically executed constraints according to the following equation:

Combining Static and Dynamic Control Structures

Section 4.5.3

$$I_j' = I_j - \sum_{j=1}^k c_j(I_j)$$
 (4-17)

Where constraints 1 through k are executed by the static priority interrupt control structure.

## **Multiple Processor Control Structures**

5:

So far we have only considered control structures using a single processor. This chapter discusses some of the issues involved in making use of more than one processor.

The first question to consider is how does using more than one processor improve the real-time performance of a block diagram schema? For static control structures, implementing constraint paths as control structures on separate processors improves both the latency performance and the bandwidth performance by decreasing the weights of the critical windows and decreasing the weight of the cycle. At the limit where each constraint path is implemented on its own processor, the latency performance for constraint  $i = C_i$  is  $2|C_i|$  and the bandwidth performance is  $1/|C_i|$ .

Similarly for dynamic control structures, if each constraint path were implemented on separate processors than each could run at the highest priority. The latency performance would be  $|C_j|$  and the bandwidth performance would be  $1/|C_j|$ .

However, these figures are not the best achievable. Each processor could execute only a single block, but then data must be transferred between processors. The interprocessor communication time may or may not be negligible depending on how the processors are interconnected. If data is transferred using an asynchronous serial transmission protocol, then at 9600 baud it would take about one millisecond to transfer one byte between processors. Data values are likely to take from one to four bytes, and a few milliseconds is a comparatively large time, even on relatively slow microcomputers. On the other hand, if the data is transmitted eight bits in parallel, the communication time may be negligible.

Even if the communication cost is negligible, executing a single block on each processor does not improve the latency performance when each processor is

-69-

running a static control structure. Consider the example shown in figure 5-1.



## A simple multi-processor control structure Figure 5-1

Assume processor one is executing  $A^*$  and processor two is executing  $B^*$ . The latency from a to b is  $2t_A$ . If the processors were synchronized so that processor two started executing B as soon as processor one finished executing A then the latency from a to c would be  $2t_A + t_B$ . The processors are not synchronized, but the phase difference is cannot be more than the period of either cycle. Therefore,  $t_{A,C}$  is  $2t_A + t_B + min(t_A, t_B)$ . But then if  $t_B$  is less than  $t_A$ ,  $t_{A,C}$  is  $2t_A + 2t_B$ , exactly the same as if processor one were executing  $(AB)^*$ . What, if anything, has been gained by using two processors? The latency performance has not been improved, but the bandwidth performance has been improved to  $min(1/t_A, 1/t_B) = 1/t_A$ .

For the constraint  $C_i = c_{i_1} \cdots c_{i_k}$  the latency for the entire path is:

$$\sum_{j=1}^{k} |c_{ij}| + |c_{ij}| + \sum_{j=1}^{k-1} \min(|c_{ij}|, |c_{ij+1}|)$$
 (5-1)

The first summation in (5-1) is the basic CPU time needed to execute the path.

The rest of (5-1) is the phase delay between processors.

If the interprocessor communication times were not negligible, they would

increase the phase delay between processors. Let  $IPCT_{j,j+1}$  be the interprocessor communication time between the processor where  $c_{i,j}$  is executed and the processor where  $c_{i,j+1}$  is executed. equation (5-1) becomes:

$$\sum_{j=1}^{k} |c_{ij}| + |c_{ij}| + \sum_{j=1}^{k-1} \min(|c_{ij}|, |c_{ij}|) + \sum_{j=1}^{k-1} (IPCT_{j,j+1})$$
 (5-2)

If processor two could be synchronized with processor one then the latency performance could be improved even more. Notice that for processor two, link b is an input link. However, the input signal at link b does <u>not</u> change continuously. Therefore, processor two should synchronize with processor one by executing a <u>dynamic</u> control structure. If processor two can keep up with processor one (i.e. If  $t_B \le t_A$ ) then the latency performance would improve to  $2t_A + t_B$ .

If processor two cannot keep up with processor one, the latency performance depends on whether or not processor one synchronizes with processor two. We will assume that requests for B occurring when B is executing are remembered and therefore processor two is executing  $B^*$ . If processor one does not wait for processor one, the processors are essentially be executing static control structures.

If processor one synchronizes with processor two by idling until processor two is ready to accept the next request, processor one still executes  $A^*$ , but A is executed once every  $t_B$  units instead of once every  $t_A$  units. The latency performance becomes  $2t_B + t_B$ . For the constraint  $C_i = c_i \cdots c_i$ , it is necessary to synchronize all k processors so each processor will idle until the next processor

is ready to accept a new request. Then the latency performance becomes:

$$2\max_{j=1}^{k} (|c_{ij}|) + \sum_{j=2}^{k} |c_{ij}|$$
(5-3)

The first term in (5-3) is the latency performance of the static control structure running on the first processor when synchronized to the slowest block in the constraint path. The bandwidth performance of the multi-processor control structure is:

$$\frac{1}{k}$$

$$\max_{j=1}^{k} |c_{j}|$$

$$(5-4)$$

If the interprocessor communications costs are not negligible, equation (5-3) becomes:

$$2\max_{j=1}^{k}(|c_{ij}|) + \sum_{j=2}^{k}|c_{ij}| + \sum_{j=1}^{k-1}(IPCT_{j,j+1})$$
(6-5)

Notice that in general, implementing each constraint path on a separate processor will improve both the bandwidth and latency performance of the control structure. Splitting a constraint path across several processors may not improve the latency performance, especially if the communication costs are not negligible. However, this will improve the bandwidth performance.

## 5.1: Assigning Control Structures to Multiple Processors

If the real-time specifications do not exceed the bounds implied by the equations derived in the previous section, then the specifications can be met by a control structure which assigns one block per processor. Although one can argue that computers are cheap, and getting cheaper all the time, they are not free.

Therefore, we are generally interested in finding a control structure that satisfies the real-time specifications that uses a minimal number of processors.

Unfortunately, this problem is computationally intractable. It has been shown that the problem of minimizing the number of processors for a dynamic control structure problem is NP-complete for the special case of independent tasks and deadlines coinciding with the next request for each task [4]. Also, Al Mok has discovered that the problem of minimizing the number of processors needed to implement a static control structure is also NP-complete [17].

On the other hand, experience with similar problems has shown that reasonable heuristics may exist. Dhall's work shows that statically assigning independent tasks to processors running an earliest deadline scheduler is directly equivalent to the bin-packing problem. Although this is an NP-complete problem, several heuristics are known that are sup-optimal by a bounded ratio. These algorithms are directly applicable to the scheduling problem. Assigning tasks to processors running static priority schedulers is not equivalent to the bin-packing problem, but Dhall has established similar bounds for simple first-fit and next-fit algorithms.

Therefore, it is reasonable to expect that similar bounds could be derived for algorithm that assigned constraints to processors using a first-fit or next-fit strategy. Some other factors should affect the assignment of tasks to processors. If the block diagram can be partitioned into disjoint subdiagrams, and the subdiagrams assigned to processors as a unit then no interprocessor communication is needed. However, the bandwidth or latency specifications on a constraint path may require that the blocks of the constraint path be split among several processors.

# 5.2: Dynamic Assignment of Processors

It doesn't make sense to assign processors dynamically if each processor is running a static control structure. However, if the processors are running dynamic control structures (i.e. if we have requests for tasks), then a control structure might do better by not assigning a processor to a task until it requests service. Unfortunately, there are no known algorithms for scheduling more than one processor in an optimal manner in the sense that the earliest deadline scheduler is optimal for a single processor. In fact, Mok has shown that such an algorithm must have knowledge about future requests. Unfortunately, limitations on the set of tasks that dynamic schedulers can guarantee to schedule to meet their deadlines are comparable to the restrictions imposed by statically assigning tasks to processors.

<sup>1.</sup> See [16] p87.

# **Summary and Conclusions**

We have presented a model for real-time computations that provides precise definitions of real-time performance. The model has the additional advantage of atrongly corresponding to intuition. This makes the model ideal for defining the semantics of a real-time programming language. The model also avoids close association with any implementation. Therefore, the model is applicable to a wide variety of systems. Conversely, a language based on this model should be easily implementable in a wide variety of ways, without encountering features of the model too finely tuned to a particular implementation.

Several strategies for implementing control structures for block diagram systems were investigated. The first strategy was to find a static execution order for the blocks in the diagram. Control structures of this type have been somewhat ignored for time critical applications. An important result is that any such control structure could be represented as a finite cycle, although the bounds on the length of the cycle are so large that explicit enumeration is impractical as a synthesis technique. A branch-and-bound synthesis method was developed, but unfortunately it is also impractical for large problems. We suspect that the synthesis problem is NP-complete (computationally intractable), but have not proved this conjecture. In any case, we believe it is more promising to investigate fast heuristic algorithms for synthesizing static control structures.

The next general strategy investigated made use of the fact that in many applications the input values change at discrete times. Under this assumption, block diagram schemata are closer to traditional models of real-time computations. Previous research has found optimal schedulers for the special case of one processor and independent tasks. However, simpler static priority schedulers had been ignored except for the special case of the latency specifications being

identical to the bandwidth period. We developed an efficient algorithm for assigning priorities to independent tasks when the latency specification is less than the bandwidth period. The synthesis techniques were modified to construct control structures for block diagram schemata in which the blocks were not independent.

Since the analysis of the real-time performance of block diagram schemata under a static priority control structure is similar to the analysis of static priority queueing systems, the priority assignment algorithm can also be applied to priority queueing systems.

Finally, we discussed some of the issues that arise when more than one processor is available to the control structure. The real-time performance of multiprocessor control structures was analyzed, and absolute bounds on the real-time performance for a block diagram schema were derived. If the real-time specifications can be met by a multiprocessor control structure, the objective becomes minimizing the number of processors needed to implement a feasible control structure. Several special cases are known to be NP-complete, so the general problem is also NP-complete. However, there is reason to believe that simple algorithms will produce control structures using a number of processors that differs from the minimal number by a bounded factor, although no specific algorithms were investigated.

Future work should probably concentrate on either proving various synthesis problems to be NP-complete or finding efficient algorithms. In the event the problems are intractable, the performance of efficient heuristic algorithms should be studied. Certainly any implementation of a practical language system based on block diagram schemata should attempt to find and improve such heuristic methods.

Summary and Conclusions

Section 6

A practical system should also attempt make use of more of the special cases for which efficient algorithms are known.

#### References

- [1] Appelbe, W.F. and Ito, M.R., Scheduling Heuristics in a Multiprogramming Environment, Research Report, Department of Computer Science and Department of Electrical Engineering, University of British Columbia, Vancouver, B.C., Canada, V6T 1W5 (February 1977).
- [2] Dennis, J.B., First Version of a Data Flow Procedure Language, TM-61, Project MAC, M.I.T., Cambridge MA, 02139 (May 1975).
- [3] Dertouzos, M.L., "Control Robotics: The Procedural Control of Physical Processes", Proceedings of the IFIP Congress 1974, pp 807-813.
- [4] Dhall, S.K., Scheduling Periodic-Time-Critical Jobs on Single Processor and Multiprocessor Computing Systems, UIUCDCS-R-77-859, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois (April 1977).
- [5] Fiala, E., Scheduling of Real-Time Processes in a Time-Shared Environment, S.M. Thesis, M.I.T. Department of Electrical Engineering, Cambridge MA, 02139 (May 1968).
- [6] Geiger, S.P., A New Language Approach to Computerized Process Control, S.M. Thesis, M.I.T. Department of Electrical Engineering, Cambridge MA, 02139 (February 1974).
- [7] Gonzalez, M.J. Jr., "Deterministic Processor Scheduling", ACM Computing Surveys, Vol. 9, No. 3, pp173-204 (September 1977).
- [8] Graham, R.L., "Bounds on Multiprocessing Anomalies and Related Packing Algorithms", Spring Joint Computer Conference, 1972, pp205-217.
- [9] Hewitt, C. and Baker, H., "Laws for Communicating Parallel Processes", 1977 IFIP Congress Proceedings, pp987-992 (1977).
- [10] Ibaraki, T., "The Power of Dominance Relations in Branch-and-Bound Algorithms", Journal of the ACM, Vol. 24, No. 2, pp264-279 (April 1977).
- [11] Kelly, J.L. Jr., Lochbaum, C. and Vyssotsky, V.A., "A Block Diagram Compiler", Bell System Technical Journal, Vol. 40, No. 3, pp669-676 (May 1961).
- [12] Kieburtz, R.B. and Hennessy, J.L., "TOMAL A High Level Programming Language for Microprocessor Process Control Applications", ACM SIGPLAN Notices, Vol. 11, No. 4, pp127-134 (April 1976).

#### References

- [13] Kohler, W.H. and Steiglitz, K., "Characterization and Theoretical Comparison of Branch-and-Bound Algorithms for Permutation Problems", Journal of the ACM, Vol. 21, No. 1, pp140-156 (January 1974).
- [14] Liu, C.L., and Layland, J.W., "Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment", Journal of the ACM, Vol. 20, No. 1, pp46-61 (January 1973).
- [15] Manacher, G.K., "Production and Stabilization of Real-Time Task Schedules", Journal of the ACM, Vol. 14, No. 3, pp439-465 (July 1967).
- [16] Mok, A.K., Task Scheduling in the Control Robotics Environment, TM-77, Laboratory for Computer Science, M.I.T., Cambridge MA, 02139 (September 1976).
- [17] Mok, A.K., private communication.
- [18] Pershing, J.A. Jr., Design of a Domain Specific Meta-Compiler for Systems Using Graphical Input as a Source Language, S.M. Thesis, M.I.T. Department of Electrical Engineering and Computer Science, Cambridge MA, 02139 (January 1978).
- [19] Real Time Disk Operating System User's Manual, 093-000075-01, Data General Corporation, Southboro, MA (1972).
- [20] RTX-16 Real Time Executive User's Manual, 88A0114A-E, General Automation, Incorporated, Anaheim, CA (1972).
- [21] Telxeira, T.J., High Level Language for Process Control, WP-7, Engineering Robotics Group, Project MAC, M.I.T., Cambridge MA, 02139 (January 1975).
- [22] Weng, K., Stream-Oriented Computation in Recursive Data Flow Schemas, TM-68, Project Mac, M.I.T., Cambridge MA, 02139 (October 1975).

### OFFICIAL DISTRIBUTION LIST

Defense Documentation Center Cameron Station Alexandria, VA 22314 12 copies

Office of Naval Research Information Systems Program Code 437 Arlington, VA 22217 2 copies

Office of Naval Research Branch Office/Boston 495 Summer Street Boston, MA 02210 1 copy

Office of Naval Research Branch Office/Chicago 536 South Clark Street Chicago, IL 60605 1 copy

Office of Naval Research Branch Office/Pasadena 1030 East Green Street Pasadena, CA 91106 1 copy

New York Area Office 715 Broadway - 5th floor New York, N. Y. 10003 1 copy

Naval Research Laboratory Technical Information Division Code 2627 Washington, D. C. 20375 6 copies

Assistant Chief for Technology Office of Naval Research Code 200 Arlington, VA 22217 1 copy

Office of Naval Research Code 455 Arlington, VA 22217 1 copy Dr. A. L. Slafkosky
Scientific Advisor
Commandant of the Marine Corps
(Code RD-1)
Washington, D. C. 20380
1 copy

Office of Naval Research Code 458 Arlington, VA 22217 1 copy

Naval Electronics Lab Center Advanced Software Technology Division - Code 5200 San Diego, CA 92152 1 copy

Mr. E. H. Gleissner
Naval Ship Research & Development Center
Computation & Math Department
Bethesda, MD 20084

1 copy

Captain Grace M. Hopper
NAICOM/MIS Planning Branch
(OP-916D)
Office of Chief of Naval Operations
Washington, D. C. 20350
1 copy

Mr. Kin B. Thompson
Technical Director
Information Systems Division
(OP-91T)
Office of Chief of Naval Operations
Washington, D. C. 20350
1 copy

Captain Richard L. Martin, USN Commanding Officer USS Francis Marion (LPA-249) FPO New York, N. Y. 09501 1 copy